[GitHub] [systemds] codeyeeter commented on a change in pull request #1323: [WIP][SYSTEMDS-2835] Python end-to-end tutorial

GitBox Tue, 06 Jul 2021 04:13:41 -0700


codeyeeter commented on a change in pull request #1323:
URL: https://github.com/apache/systemds/pull/1323#discussion_r663759655




##########
File path: src/main/python/tests/examples/tutorials/test_adult.py
##########
@@ -386,51 +386,35 @@ def test_level2(self):
 
         """""
         
################################################################################################################
-        X1, M1 = X1.transform_encode(spec=jspec).compute()
+        X1, M1 = X1.transform_encode(spec=jspec)
 
         
################################################################################################################
         """"
-        First we re-split out data into a training and a test set with the 
corresponding labels. We can then simply transform
-        the numpy array of the training data back to SystemDS matrix by using 
"sds.from_numpy()". 
-        The SystemDS scale function takes a matrix as an input and returns 
three output parameters:
-            # Y            Matrix    ---      Output feature matrix with K 
columns
-            # ColMean      Matrix    ---      The column means of the input, 
subtracted if Center was TRUE
-            # ScaleFactor  Matrix    ---      The Scaling of the values, to 
make each dimension have similar value ranges
-        If we want to retransform a SystemDs Matrix to a Numpy array we can do 
so by using the np.array() function. 
+        First we re-split out data into a training and a test set with the 
corresponding labels. 
         """""
         
################################################################################################################
-        col_length = len(X1[0])
-        X = X1[0:train_count, 0:col_length - 1]
-        Y = X1[0:train_count, col_length - 1:col_length].flatten()
-        # Test data
-        Xt = X1[train_count:train_count + test_count, 0:col_length - 1]
-        Yt = X1[train_count:train_count + test_count, col_length - 
1:col_length].flatten()
+        PREPROCESS_package = self.sds.source(self.preprocess_src_path, 
"preprocess", print_imported_methods=True)
 
+        X = PREPROCESS_package.get_X(X1, train_count)
+        Y = PREPROCESS_package.get_Y(X1, train_count)
+        #We lose the column count information after using the Preprocess 
Package. This triggers an error on multilogregpredict. Otherwise its working
+        Xt = self.sds.from_numpy(np.array(PREPROCESS_package.get_Xt(X1, 
train_count).compute()))

Review comment:
       @Baunsgaard 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [systemds] codeyeeter commented on a change in pull request #1323: [WIP][SYSTEMDS-2835] Python end-to-end tutorial

Reply via email to