codeyeeter commented on a change in pull request #1323: URL: https://github.com/apache/systemds/pull/1323#discussion_r663759655
########## File path: src/main/python/tests/examples/tutorials/test_adult.py ########## @@ -386,51 +386,35 @@ def test_level2(self): """"" ################################################################################################################ - X1, M1 = X1.transform_encode(spec=jspec).compute() + X1, M1 = X1.transform_encode(spec=jspec) ################################################################################################################ """" - First we re-split out data into a training and a test set with the corresponding labels. We can then simply transform - the numpy array of the training data back to SystemDS matrix by using "sds.from_numpy()". - The SystemDS scale function takes a matrix as an input and returns three output parameters: - # Y Matrix --- Output feature matrix with K columns - # ColMean Matrix --- The column means of the input, subtracted if Center was TRUE - # ScaleFactor Matrix --- The Scaling of the values, to make each dimension have similar value ranges - If we want to retransform a SystemDs Matrix to a Numpy array we can do so by using the np.array() function. + First we re-split out data into a training and a test set with the corresponding labels. """"" ################################################################################################################ - col_length = len(X1[0]) - X = X1[0:train_count, 0:col_length - 1] - Y = X1[0:train_count, col_length - 1:col_length].flatten() - # Test data - Xt = X1[train_count:train_count + test_count, 0:col_length - 1] - Yt = X1[train_count:train_count + test_count, col_length - 1:col_length].flatten() + PREPROCESS_package = self.sds.source(self.preprocess_src_path, "preprocess", print_imported_methods=True) + X = PREPROCESS_package.get_X(X1, train_count) + Y = PREPROCESS_package.get_Y(X1, train_count) + #We lose the column count information after using the Preprocess Package. This triggers an error on multilogregpredict. Otherwise its working + Xt = self.sds.from_numpy(np.array(PREPROCESS_package.get_Xt(X1, train_count).compute())) Review comment: @Baunsgaard -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org