Repository: incubator-systemml Updated Branches: refs/heads/master 6b377319e -> 1173a5cb2
[SYSTEMML-947] Remove binary block classes from MLContext Remove BinaryBlockMatrix and BinaryBlockMatrix classes from MLContext API and incorporate similar functionality into Matrix and Frame classes. Closes #531. Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/1173a5cb Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/1173a5cb Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/1173a5cb Branch: refs/heads/master Commit: 1173a5cb215547686ab7593cbd9edef3314fed8d Parents: 6b37731 Author: Deron Eriksson <[email protected]> Authored: Wed Jun 7 10:28:44 2017 -0700 Committer: Deron Eriksson <[email protected]> Committed: Wed Jun 7 10:28:44 2017 -0700 ---------------------------------------------------------------------- docs/spark-mlcontext-programming-guide.md | 23 +-- .../sysml/api/mlcontext/BinaryBlockFrame.java | 181 ------------------- .../sysml/api/mlcontext/BinaryBlockMatrix.java | 168 ----------------- .../org/apache/sysml/api/mlcontext/Frame.java | 98 +++++++++- .../api/mlcontext/MLContextConversionUtil.java | 88 +++++---- .../sysml/api/mlcontext/MLContextUtil.java | 42 +++-- .../apache/sysml/api/mlcontext/MLResults.java | 12 -- .../org/apache/sysml/api/mlcontext/Matrix.java | 137 +++++++++++++- src/main/python/systemml/mlcontext.py | 2 +- .../org/apache/sysml/api/dl/Caffe2DML.scala | 4 +- .../sysml/api/ml/BaseSystemMLClassifier.scala | 10 +- .../sysml/api/ml/BaseSystemMLRegressor.scala | 8 +- .../apache/sysml/api/ml/LinearRegression.scala | 2 +- .../sysml/api/ml/LogisticRegression.scala | 2 +- .../org/apache/sysml/api/ml/NaiveBayes.scala | 12 +- .../apache/sysml/api/ml/PredictionUtils.scala | 10 +- .../scala/org/apache/sysml/api/ml/SVM.scala | 4 +- .../mlcontext/DataFrameVectorScriptTest.java | 90 +++++---- .../mlcontext/MLContextOutputBlocksizeTest.java | 37 ++-- .../integration/mlcontext/MLContextTest.java | 54 +++--- 20 files changed, 417 insertions(+), 567 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/docs/spark-mlcontext-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/spark-mlcontext-programming-guide.md b/docs/spark-mlcontext-programming-guide.md index c424c70..ddccde1 100644 --- a/docs/spark-mlcontext-programming-guide.md +++ b/docs/spark-mlcontext-programming-guide.md @@ -243,7 +243,7 @@ mean: Double = 0.49996223966662934 Many different types of input and output variables are automatically allowed. These types include `Boolean`, `Long`, `Double`, `String`, `Array[Array[Double]]`, `RDD<String>` and `JavaRDD<String>` -in `CSV` (dense) and `IJV` (sparse) formats, `DataFrame`, `BinaryBlockMatrix`, `Matrix`, and +in `CSV` (dense) and `IJV` (sparse) formats, `DataFrame`, `Matrix`, and `Frame`. RDDs and JavaRDDs are assumed to be CSV format unless MatrixMetadata is supplied indicating IJV format. @@ -1606,11 +1606,7 @@ Therefore, if you use a set of data multiple times, one way to potentially impro to convert it to a SystemML matrix representation and then use this representation rather than performing the data conversion each time. -There are currently two mechanisms for this in SystemML: **(1) BinaryBlockMatrix** and **(2) Matrix**. - -**BinaryBlockMatrix:** - -If you have an input DataFrame, it can be converted to a BinaryBlockMatrix, and this BinaryBlockMatrix +If you have an input DataFrame, it can be converted to a Matrix, and this Matrix can be passed as an input rather than passing in the DataFrame as an input. For example, suppose we had a 10000x100 matrix represented as a DataFrame, as we saw in an earlier example. @@ -1633,10 +1629,10 @@ val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", {% endhighlight %} Rather than passing in a DataFrame each time to the Script object creation, let's instead create a -BinaryBlockMatrix object based on the DataFrame and pass this BinaryBlockMatrix to the Script object +Matrix object based on the DataFrame and pass this Matrix to the Script object creation. If we run the code below in the Spark Shell, we see that the data conversion step occurs -when the BinaryBlockMatrix object is created. However, when we create a Script object twice, we see -that no conversion penalty occurs, since this conversion occurred when the BinaryBlockMatrix was +when the Matrix object is created. However, when we create a Script object twice, we see +that no conversion penalty occurs, since this conversion occurred when the Matrix was created. {% highlight scala %} @@ -1649,14 +1645,11 @@ val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCol val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } ) val df = spark.createDataFrame(data, schema) val mm = new MatrixMetadata(numRows, numCols) -val bbm = new BinaryBlockMatrix(df, mm) -val minMaxMeanScript = dml(minMaxMean).in("Xin", bbm).out("minOut", "maxOut", "meanOut") -val minMaxMeanScript = dml(minMaxMean).in("Xin", bbm).out("minOut", "maxOut", "meanOut") +val matrix = new Matrix(df, mm) +val minMaxMeanScript = dml(minMaxMean).in("Xin", matrix).out("minOut", "maxOut", "meanOut") +val minMaxMeanScript = dml(minMaxMean).in("Xin", matrix).out("minOut", "maxOut", "meanOut") {% endhighlight %} - -**Matrix:** - When a matrix is returned as an output, it is returned as a Matrix object, which is a wrapper around a SystemML MatrixObject. As a result, an output Matrix is already in a SystemML representation, meaning that it can be passed as an input with no data conversion penalty. http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockFrame.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockFrame.java b/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockFrame.java deleted file mode 100644 index 8c58315..0000000 --- a/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockFrame.java +++ /dev/null @@ -1,181 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.sysml.api.mlcontext; - -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.sql.Dataset; -import org.apache.spark.sql.Row; -import org.apache.sysml.conf.ConfigurationManager; -import org.apache.sysml.parser.Expression.ValueType; -import org.apache.sysml.runtime.DMLRuntimeException; -import org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext; -import org.apache.sysml.runtime.matrix.MatrixCharacteristics; -import org.apache.sysml.runtime.matrix.data.FrameBlock; - -/** - * BinaryBlockFrame stores data as a SystemML binary-block frame representation. - * - */ -public class BinaryBlockFrame { - - JavaPairRDD<Long, FrameBlock> binaryBlocks; - FrameMetadata frameMetadata; - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation. - * - * @param dataFrame - * the Spark DataFrame - * @param frameMetadata - * frame metadata, such as number of rows and columns - */ - public BinaryBlockFrame(Dataset<Row> dataFrame, FrameMetadata frameMetadata) { - this.frameMetadata = frameMetadata; - binaryBlocks = MLContextConversionUtil.dataFrameToFrameBinaryBlocks(dataFrame, frameMetadata); - } - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation, - * specifying the number of rows and columns. - * - * @param dataFrame - * the Spark DataFrame - * @param numRows - * the number of rows - * @param numCols - * the number of columns - */ - public BinaryBlockFrame(Dataset<Row> dataFrame, long numRows, long numCols) { - this(dataFrame, new FrameMetadata(numRows, numCols, ConfigurationManager.getBlocksize(), - ConfigurationManager.getBlocksize())); - } - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation. - * - * @param dataFrame - * the Spark DataFrame - */ - public BinaryBlockFrame(Dataset<Row> dataFrame) { - this(dataFrame, new FrameMetadata()); - } - - /** - * Create a BinaryBlockFrame, specifying the SystemML binary-block frame and - * its metadata. - * - * @param binaryBlocks - * the {@code JavaPairRDD<Long, FrameBlock>} frame - * @param matrixCharacteristics - * the frame metadata as {@code MatrixCharacteristics} - */ - public BinaryBlockFrame(JavaPairRDD<Long, FrameBlock> binaryBlocks, MatrixCharacteristics matrixCharacteristics) { - this.binaryBlocks = binaryBlocks; - this.frameMetadata = new FrameMetadata(matrixCharacteristics); - } - - /** - * Create a BinaryBlockFrame, specifying the SystemML binary-block frame and - * its metadata. - * - * @param binaryBlocks - * the {@code JavaPairRDD<Long, FrameBlock>} frame - * @param frameMetadata - * the frame metadata as {@code FrameMetadata} - */ - public BinaryBlockFrame(JavaPairRDD<Long, FrameBlock> binaryBlocks, FrameMetadata frameMetadata) { - this.binaryBlocks = binaryBlocks; - this.frameMetadata = frameMetadata; - } - - /** - * Obtain a SystemML binary-block frame as a - * {@code JavaPairRDD<Long, FrameBlock>} - * - * @return the SystemML binary-block frame - */ - public JavaPairRDD<Long, FrameBlock> getBinaryBlocks() { - return binaryBlocks; - } - - /** - * Obtain a SystemML binary-block frame as a {@code FrameBlock} - * - * @return the SystemML binary-block frame as a {@code FrameBlock} - */ - public FrameBlock getFrameBlock() { - try { - MatrixCharacteristics mc = getMatrixCharacteristics(); - FrameSchema frameSchema = frameMetadata.getFrameSchema(); - return SparkExecutionContext.toFrameBlock(binaryBlocks, frameSchema.getSchema().toArray(new ValueType[0]), - (int) mc.getRows(), (int) mc.getCols()); - } catch (DMLRuntimeException e) { - throw new MLContextException("Exception while getting FrameBlock from binary-block frame", e); - } - } - - /** - * Obtain the SystemML binary-block frame characteristics - * - * @return the frame metadata as {@code MatrixCharacteristics} - */ - public MatrixCharacteristics getMatrixCharacteristics() { - return frameMetadata.asMatrixCharacteristics(); - } - - /** - * Obtain the SystemML binary-block frame metadata - * - * @return the frame metadata as {@code FrameMetadata} - */ - public FrameMetadata getFrameMetadata() { - return frameMetadata; - } - - /** - * Set the SystemML binary-block frame metadata - * - * @param frameMetadata - * the frame metadata - */ - public void setFrameMetadata(FrameMetadata frameMetadata) { - this.frameMetadata = frameMetadata; - } - - /** - * Set the SystemML binary-block frame as a - * {@code JavaPairRDD<Long, FrameBlock>} - * - * @param binaryBlocks - * the SystemML binary-block frame - */ - public void setBinaryBlocks(JavaPairRDD<Long, FrameBlock> binaryBlocks) { - this.binaryBlocks = binaryBlocks; - } - - @Override - public String toString() { - if (frameMetadata != null) { - return frameMetadata.toString(); - } else { - return super.toString(); - } - } -} http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockMatrix.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockMatrix.java b/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockMatrix.java deleted file mode 100644 index c0f46be..0000000 --- a/src/main/java/org/apache/sysml/api/mlcontext/BinaryBlockMatrix.java +++ /dev/null @@ -1,168 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.sysml.api.mlcontext; - -import org.apache.spark.api.java.JavaPairRDD; -import org.apache.spark.sql.Dataset; -import org.apache.spark.sql.Row; -import org.apache.sysml.conf.ConfigurationManager; -import org.apache.sysml.runtime.DMLRuntimeException; -import org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext; -import org.apache.sysml.runtime.matrix.MatrixCharacteristics; -import org.apache.sysml.runtime.matrix.data.MatrixBlock; -import org.apache.sysml.runtime.matrix.data.MatrixIndexes; - -/** - * BinaryBlockMatrix stores data as a SystemML binary-block matrix - * representation. - * - */ -public class BinaryBlockMatrix { - - JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks; - MatrixMetadata matrixMetadata; - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation. - * - * @param dataFrame - * the Spark DataFrame - * @param matrixMetadata - * matrix metadata, such as number of rows and columns - */ - public BinaryBlockMatrix(Dataset<Row> dataFrame, MatrixMetadata matrixMetadata) { - this.matrixMetadata = matrixMetadata; - binaryBlocks = MLContextConversionUtil.dataFrameToMatrixBinaryBlocks(dataFrame, matrixMetadata); - } - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation, - * specifying the number of rows and columns. - * - * @param dataFrame - * the Spark DataFrame - * @param numRows - * the number of rows - * @param numCols - * the number of columns - */ - public BinaryBlockMatrix(Dataset<Row> dataFrame, long numRows, long numCols) { - this(dataFrame, new MatrixMetadata(numRows, numCols, ConfigurationManager.getBlocksize(), - ConfigurationManager.getBlocksize())); - } - - /** - * Convert a Spark DataFrame to a SystemML binary-block representation. - * - * @param dataFrame - * the Spark DataFrame - */ - public BinaryBlockMatrix(Dataset<Row> dataFrame) { - this(dataFrame, new MatrixMetadata()); - } - - /** - * Create a BinaryBlockMatrix, specifying the SystemML binary-block matrix - * and its metadata. - * - * @param binaryBlocks - * the {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} matrix - * @param matrixCharacteristics - * the matrix metadata as {@code MatrixCharacteristics} - */ - public BinaryBlockMatrix(JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks, - MatrixCharacteristics matrixCharacteristics) { - this.binaryBlocks = binaryBlocks; - this.matrixMetadata = new MatrixMetadata(matrixCharacteristics); - } - - /** - * Obtain a SystemML binary-block matrix as a - * {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} - * - * @return the SystemML binary-block matrix - */ - public JavaPairRDD<MatrixIndexes, MatrixBlock> getBinaryBlocks() { - return binaryBlocks; - } - - /** - * Obtain a SystemML binary-block matrix as a {@code MatrixBlock} - * - * @return the SystemML binary-block matrix as a {@code MatrixBlock} - */ - public MatrixBlock getMatrixBlock() { - try { - MatrixCharacteristics mc = getMatrixCharacteristics(); - return SparkExecutionContext.toMatrixBlock(binaryBlocks, (int) mc.getRows(), (int) mc.getCols(), - mc.getRowsPerBlock(), mc.getColsPerBlock(), mc.getNonZeros()); - } catch (DMLRuntimeException e) { - throw new MLContextException("Exception while getting MatrixBlock from binary-block matrix", e); - } - } - - /** - * Obtain the SystemML binary-block matrix characteristics - * - * @return the matrix metadata as {@code MatrixCharacteristics} - */ - public MatrixCharacteristics getMatrixCharacteristics() { - return matrixMetadata.asMatrixCharacteristics(); - } - - /** - * Obtain the SystemML binary-block matrix metadata - * - * @return the matrix metadata as {@code MatrixMetadata} - */ - public MatrixMetadata getMatrixMetadata() { - return matrixMetadata; - } - - /** - * Set the SystemML binary-block matrix metadata - * - * @param matrixMetadata - * the matrix metadata - */ - public void setMatrixMetadata(MatrixMetadata matrixMetadata) { - this.matrixMetadata = matrixMetadata; - } - - /** - * Set the SystemML binary-block matrix as a - * {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} - * - * @param binaryBlocks - * the SystemML binary-block matrix - */ - public void setBinaryBlocks(JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks) { - this.binaryBlocks = binaryBlocks; - } - - @Override - public String toString() { - if (matrixMetadata != null) { - return matrixMetadata.toString(); - } else { - return super.toString(); - } - } -} http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/Frame.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/Frame.java b/src/main/java/org/apache/sysml/api/mlcontext/Frame.java index 9d3bb2c..db6252d 100644 --- a/src/main/java/org/apache/sysml/api/mlcontext/Frame.java +++ b/src/main/java/org/apache/sysml/api/mlcontext/Frame.java @@ -19,12 +19,16 @@ package org.apache.sysml.api.mlcontext; +import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.rdd.RDD; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; +import org.apache.sysml.conf.ConfigurationManager; import org.apache.sysml.runtime.controlprogram.caching.FrameObject; import org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext; +import org.apache.sysml.runtime.matrix.MatrixCharacteristics; +import org.apache.sysml.runtime.matrix.data.FrameBlock; /** * Frame encapsulates a SystemML frame. @@ -34,10 +38,66 @@ public class Frame { private FrameObject frameObject; private SparkExecutionContext sparkExecutionContext; + private JavaPairRDD<Long, FrameBlock> binaryBlocks; + private FrameMetadata frameMetadata; public Frame(FrameObject frameObject, SparkExecutionContext sparkExecutionContext) { this.frameObject = frameObject; this.sparkExecutionContext = sparkExecutionContext; + this.frameMetadata = new FrameMetadata(frameObject.getMatrixCharacteristics()); + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation. + * + * @param dataFrame + * the Spark DataFrame + * @param frameMetadata + * frame metadata, such as number of rows and columns + */ + public Frame(Dataset<Row> dataFrame, FrameMetadata frameMetadata) { + this.frameMetadata = frameMetadata; + binaryBlocks = MLContextConversionUtil.dataFrameToFrameBinaryBlocks(dataFrame, frameMetadata); + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation, + * specifying the number of rows and columns. + * + * @param dataFrame + * the Spark DataFrame + * @param numRows + * the number of rows + * @param numCols + * the number of columns + */ + public Frame(Dataset<Row> dataFrame, long numRows, long numCols) { + this(dataFrame, new FrameMetadata(numRows, numCols, ConfigurationManager.getBlocksize(), + ConfigurationManager.getBlocksize())); + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation. + * + * @param dataFrame + * the Spark DataFrame + */ + public Frame(Dataset<Row> dataFrame) { + this(dataFrame, new FrameMetadata()); + } + + /** + * Create a Frame, specifying the SystemML binary-block frame and its + * metadata. + * + * @param binaryBlocks + * the {@code JavaPairRDD<Long, FrameBlock>} frame + * @param frameMetadata + * frame metadata, such as number of rows and columnss + */ + public Frame(JavaPairRDD<Long, FrameBlock> binaryBlocks, FrameMetadata frameMetadata) { + this.binaryBlocks = binaryBlocks; + this.frameMetadata = frameMetadata; } /** @@ -104,12 +164,20 @@ public class Frame { } /** - * Obtain the matrix as a {@code BinaryBlockFrame} + * Obtain the frame as a {@code JavaPairRDD<Long, FrameBlock>} * - * @return the matrix as a {@code BinaryBlockFrame} + * @return the frame as a {@code JavaPairRDD<Long, FrameBlock>} */ - public BinaryBlockFrame toBinaryBlockFrame() { - return MLContextConversionUtil.frameObjectToBinaryBlockFrame(frameObject, sparkExecutionContext); + public JavaPairRDD<Long, FrameBlock> toBinaryBlocks() { + if (binaryBlocks != null) { + return binaryBlocks; + } else if (frameObject != null) { + binaryBlocks = MLContextConversionUtil.frameObjectToBinaryBlocks(frameObject, sparkExecutionContext); + MatrixCharacteristics mc = frameObject.getMatrixCharacteristics(); + frameMetadata = new FrameMetadata(mc); + return binaryBlocks; + } + throw new MLContextException("No binary blocks or FrameObject found"); } /** @@ -118,11 +186,31 @@ public class Frame { * @return the frame metadata */ public FrameMetadata getFrameMetadata() { - return new FrameMetadata(frameObject.getMatrixCharacteristics()); + return frameMetadata; } @Override public String toString() { return frameObject.toString(); } + + /** + * Whether or not this frame contains data as binary blocks + * + * @return {@code true} if data as binary blocks are present, {@code false} + * otherwise. + */ + public boolean hasBinaryBlocks() { + return (binaryBlocks != null); + } + + /** + * Whether or not this frame contains data as a FrameObject + * + * @return {@code true} if data as binary blocks are present, {@code false} + * otherwise. + */ + public boolean hasFrameObject() { + return (frameObject != null); + } } http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java b/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java index 149b541..5883127 100644 --- a/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java +++ b/src/main/java/org/apache/sysml/api/mlcontext/MLContextConversionUtil.java @@ -252,6 +252,31 @@ public class MLContextConversionUtil { } /** + * Convert a {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} to a + * {@code MatrixBlock} + * + * @param binaryBlocks + * {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} representation + * of a binary-block matrix + * @param matrixMetadata + * the matrix metadata + * @return the {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} matrix + * converted to a {@code MatrixBlock} + */ + public static MatrixBlock binaryBlocksToMatrixBlock(JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks, + MatrixMetadata matrixMetadata) { + try { + MatrixBlock matrixBlock = SparkExecutionContext.toMatrixBlock(binaryBlocks, + matrixMetadata.getNumRows().intValue(), matrixMetadata.getNumColumns().intValue(), + matrixMetadata.getNumRowsPerBlock(), matrixMetadata.getNumColumnsPerBlock(), + matrixMetadata.getNumNonZeros()); + return matrixBlock; + } catch (DMLRuntimeException e) { + throw new MLContextException("Exception converting binary blocks to MatrixBlock", e); + } + } + + /** * Convert a {@code JavaPairRDD<Long, FrameBlock>} to a {@code FrameObject}. * * @param variableName @@ -803,34 +828,6 @@ public class MLContextConversionUtil { } /** - * Convert an {@code BinaryBlockMatrix} to a {@code JavaRDD<String>} in IVJ - * format. - * - * @param binaryBlockMatrix - * the {@code BinaryBlockMatrix} - * @return the {@code BinaryBlockMatrix} converted to a - * {@code JavaRDD<String>} - */ - public static JavaRDD<String> binaryBlockMatrixToJavaRDDStringIJV(BinaryBlockMatrix binaryBlockMatrix) { - JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlock = binaryBlockMatrix.getBinaryBlocks(); - MatrixCharacteristics mc = binaryBlockMatrix.getMatrixCharacteristics(); - return RDDConverterUtils.binaryBlockToTextCell(binaryBlock, mc); - } - - /** - * Convert an {@code BinaryBlockMatrix} to a {@code RDD<String>} in IVJ - * format. - * - * @param binaryBlockMatrix - * the {@code BinaryBlockMatrix} - * @return the {@code BinaryBlockMatrix} converted to a {@code RDD<String>} - */ - public static RDD<String> binaryBlockMatrixToRDDStringIJV(BinaryBlockMatrix binaryBlockMatrix) { - JavaRDD<String> javaRDD = binaryBlockMatrixToJavaRDDStringIJV(binaryBlockMatrix); - return JavaRDD.toRDD(javaRDD); - } - - /** * Convert a {@code MatrixObject} to a {@code JavaRDD<String>} in CSV * format. * @@ -1213,11 +1210,11 @@ public class MLContextConversionUtil { SparkExecutionContext sparkExecutionContext, boolean isVectorDF) { try { @SuppressWarnings("unchecked") - JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlockMatrix = (JavaPairRDD<MatrixIndexes, MatrixBlock>) sparkExecutionContext + JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = (JavaPairRDD<MatrixIndexes, MatrixBlock>) sparkExecutionContext .getRDDHandleForMatrixObject(matrixObject, InputInfo.BinaryBlockInputInfo); MatrixCharacteristics mc = matrixObject.getMatrixCharacteristics(); - return RDDConverterUtils.binaryBlockToDataFrame(spark(), binaryBlockMatrix, mc, isVectorDF); + return RDDConverterUtils.binaryBlockToDataFrame(spark(), binaryBlocks, mc, isVectorDF); } catch (DMLRuntimeException e) { throw new MLContextException("DMLRuntimeException while converting matrix object to DataFrame", e); } @@ -1248,48 +1245,47 @@ public class MLContextConversionUtil { } /** - * Convert a {@code MatrixObject} to a {@code BinaryBlockMatrix}. + * Convert a {@code MatrixObject} to a + * {@code JavaPairRDD<MatrixIndexes, MatrixBlock>}. * * @param matrixObject * the {@code MatrixObject} * @param sparkExecutionContext * the Spark execution context - * @return the {@code MatrixObject} converted to a {@code BinaryBlockMatrix} + * @return the {@code MatrixObject} converted to a + * {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} */ - public static BinaryBlockMatrix matrixObjectToBinaryBlockMatrix(MatrixObject matrixObject, + public static JavaPairRDD<MatrixIndexes, MatrixBlock> matrixObjectToBinaryBlocks(MatrixObject matrixObject, SparkExecutionContext sparkExecutionContext) { try { @SuppressWarnings("unchecked") - JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlock = (JavaPairRDD<MatrixIndexes, MatrixBlock>) sparkExecutionContext + JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = (JavaPairRDD<MatrixIndexes, MatrixBlock>) sparkExecutionContext .getRDDHandleForMatrixObject(matrixObject, InputInfo.BinaryBlockInputInfo); - MatrixCharacteristics matrixCharacteristics = matrixObject.getMatrixCharacteristics(); - return new BinaryBlockMatrix(binaryBlock, matrixCharacteristics); + return binaryBlocks; } catch (DMLRuntimeException e) { - throw new MLContextException("DMLRuntimeException while converting matrix object to BinaryBlockMatrix", e); + throw new MLContextException("DMLRuntimeException while converting matrix object to binary blocks", e); } } /** - * Convert a {@code FrameObject} to a {@code BinaryBlockFrame}. + * Convert a {@code FrameObject} to a {@code JavaPairRDD<Long, FrameBlock>}. * * @param frameObject * the {@code FrameObject} * @param sparkExecutionContext * the Spark execution context - * @return the {@code FrameObject} converted to a {@code BinaryBlockFrame} + * @return the {@code FrameObject} converted to a + * {@code JavaPairRDD<Long, FrameBlock>} */ - public static BinaryBlockFrame frameObjectToBinaryBlockFrame(FrameObject frameObject, + public static JavaPairRDD<Long, FrameBlock> frameObjectToBinaryBlocks(FrameObject frameObject, SparkExecutionContext sparkExecutionContext) { try { @SuppressWarnings("unchecked") - JavaPairRDD<Long, FrameBlock> binaryBlock = (JavaPairRDD<Long, FrameBlock>) sparkExecutionContext + JavaPairRDD<Long, FrameBlock> binaryBlocks = (JavaPairRDD<Long, FrameBlock>) sparkExecutionContext .getRDDHandleForFrameObject(frameObject, InputInfo.BinaryBlockInputInfo); - MatrixCharacteristics matrixCharacteristics = frameObject.getMatrixCharacteristics(); - FrameSchema fs = new FrameSchema(Arrays.asList(frameObject.getSchema())); - FrameMetadata fm = new FrameMetadata(fs, matrixCharacteristics); - return new BinaryBlockFrame(binaryBlock, fm); + return binaryBlocks; } catch (DMLRuntimeException e) { - throw new MLContextException("DMLRuntimeException while converting frame object to BinaryBlockFrame", e); + throw new MLContextException("DMLRuntimeException while converting frame object to binary blocks", e); } } http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java b/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java index 50defcc..2c9566c 100644 --- a/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java +++ b/src/main/java/org/apache/sysml/api/mlcontext/MLContextUtil.java @@ -94,9 +94,8 @@ public final class MLContextUtil { * Complex data types supported by the MLContext API */ @SuppressWarnings("rawtypes") - public static final Class[] COMPLEX_DATA_TYPES = { JavaRDD.class, RDD.class, Dataset.class, BinaryBlockMatrix.class, - BinaryBlockFrame.class, Matrix.class, Frame.class, (new double[][] {}).getClass(), MatrixBlock.class, - URL.class }; + public static final Class[] COMPLEX_DATA_TYPES = { JavaRDD.class, RDD.class, Dataset.class, Matrix.class, + Frame.class, (new double[][] {}).getClass(), MatrixBlock.class, URL.class }; /** * All data types supported by the MLContext API @@ -414,7 +413,7 @@ public final class MLContextUtil { /** * Is the object one of the supported complex data types? (JavaRDD, RDD, - * DataFrame, BinaryBlockMatrix, Matrix, double[][], MatrixBlock, URL) + * DataFrame, Matrix, double[][], MatrixBlock, URL) * * @param object * the object type to be examined @@ -587,26 +586,29 @@ public final class MLContextUtil { return MLContextConversionUtil.dataFrameToFrameObject(name, dataFrame); } } - } else if (value instanceof BinaryBlockMatrix) { - BinaryBlockMatrix binaryBlockMatrix = (BinaryBlockMatrix) value; - if (metadata == null) { - metadata = binaryBlockMatrix.getMatrixMetadata(); - } - JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = binaryBlockMatrix.getBinaryBlocks(); - return MLContextConversionUtil.binaryBlocksToMatrixObject(name, binaryBlocks, (MatrixMetadata) metadata); - } else if (value instanceof BinaryBlockFrame) { - BinaryBlockFrame binaryBlockFrame = (BinaryBlockFrame) value; - if (metadata == null) { - metadata = binaryBlockFrame.getFrameMetadata(); - } - JavaPairRDD<Long, FrameBlock> binaryBlocks = binaryBlockFrame.getBinaryBlocks(); - return MLContextConversionUtil.binaryBlocksToFrameObject(name, binaryBlocks, (FrameMetadata) metadata); } else if (value instanceof Matrix) { Matrix matrix = (Matrix) value; - return matrix.toMatrixObject(); + if ((matrix.hasBinaryBlocks()) && (!matrix.hasMatrixObject())) { + if (metadata == null) { + metadata = matrix.getMatrixMetadata(); + } + JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = matrix.toBinaryBlocks(); + return MLContextConversionUtil.binaryBlocksToMatrixObject(name, binaryBlocks, + (MatrixMetadata) metadata); + } else { + return matrix.toMatrixObject(); + } } else if (value instanceof Frame) { Frame frame = (Frame) value; - return frame.toFrameObject(); + if ((frame.hasBinaryBlocks()) && (!frame.hasFrameObject())) { + if (metadata == null) { + metadata = frame.getFrameMetadata(); + } + JavaPairRDD<Long, FrameBlock> binaryBlocks = frame.toBinaryBlocks(); + return MLContextConversionUtil.binaryBlocksToFrameObject(name, binaryBlocks, (FrameMetadata) metadata); + } else { + return frame.toFrameObject(); + } } else if (value instanceof double[][]) { double[][] doubleMatrix = (double[][]) value; return MLContextConversionUtil.doubleMatrixToMatrixObject(name, doubleMatrix, (MatrixMetadata) metadata); http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/MLResults.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/MLResults.java b/src/main/java/org/apache/sysml/api/mlcontext/MLResults.java index ebde918..e4d23da 100644 --- a/src/main/java/org/apache/sysml/api/mlcontext/MLResults.java +++ b/src/main/java/org/apache/sysml/api/mlcontext/MLResults.java @@ -489,18 +489,6 @@ public class MLResults { } /** - * Obtain an output as a {@code BinaryBlockMatrix}. - * - * @param outputName - * the name of the output - * @return the output as a {@code BinaryBlockMatrix} - */ - public BinaryBlockMatrix getBinaryBlockMatrix(String outputName) { - MatrixObject mo = getMatrixObject(outputName); - return MLContextConversionUtil.matrixObjectToBinaryBlockMatrix(mo, sparkExecutionContext); - } - - /** * Obtain an output as a two-dimensional {@code String} array. * * @param outputName http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/java/org/apache/sysml/api/mlcontext/Matrix.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/api/mlcontext/Matrix.java b/src/main/java/org/apache/sysml/api/mlcontext/Matrix.java index 07ed516..5d4a67d 100644 --- a/src/main/java/org/apache/sysml/api/mlcontext/Matrix.java +++ b/src/main/java/org/apache/sysml/api/mlcontext/Matrix.java @@ -19,29 +19,90 @@ package org.apache.sysml.api.mlcontext; +import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.rdd.RDD; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; +import org.apache.sysml.conf.ConfigurationManager; import org.apache.sysml.runtime.controlprogram.caching.MatrixObject; import org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext; import org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtils; +import org.apache.sysml.runtime.matrix.MatrixCharacteristics; +import org.apache.sysml.runtime.matrix.data.MatrixBlock; +import org.apache.sysml.runtime.matrix.data.MatrixIndexes; /** * Matrix encapsulates a SystemML matrix. It allows for easy conversion to - * various other formats, such as RDDs, JavaRDDs, DataFrames, - * BinaryBlockMatrices, and double[][]s. After script execution, it offers a - * convenient format for obtaining SystemML matrix data in Scala tuples. + * various other formats, such as RDDs, JavaRDDs, DataFrames, and double[][]s. + * After script execution, it offers a convenient format for obtaining SystemML + * matrix data in Scala tuples. * */ public class Matrix { private MatrixObject matrixObject; private SparkExecutionContext sparkExecutionContext; + private JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks; + private MatrixMetadata matrixMetadata; public Matrix(MatrixObject matrixObject, SparkExecutionContext sparkExecutionContext) { this.matrixObject = matrixObject; this.sparkExecutionContext = sparkExecutionContext; + this.matrixMetadata = new MatrixMetadata(matrixObject.getMatrixCharacteristics()); + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation. + * + * @param dataFrame + * the Spark DataFrame + * @param matrixMetadata + * matrix metadata, such as number of rows and columns + */ + public Matrix(Dataset<Row> dataFrame, MatrixMetadata matrixMetadata) { + this.matrixMetadata = matrixMetadata; + binaryBlocks = MLContextConversionUtil.dataFrameToMatrixBinaryBlocks(dataFrame, matrixMetadata); + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation, + * specifying the number of rows and columns. + * + * @param dataFrame + * the Spark DataFrame + * @param numRows + * the number of rows + * @param numCols + * the number of columns + */ + public Matrix(Dataset<Row> dataFrame, long numRows, long numCols) { + this(dataFrame, new MatrixMetadata(numRows, numCols, ConfigurationManager.getBlocksize(), + ConfigurationManager.getBlocksize())); + } + + /** + * Create a Matrix, specifying the SystemML binary-block matrix and its + * metadata. + * + * @param binaryBlocks + * the {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} matrix + * @param matrixMetadata + * matrix metadata, such as number of rows and columns + */ + public Matrix(JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks, MatrixMetadata matrixMetadata) { + this.binaryBlocks = binaryBlocks; + this.matrixMetadata = matrixMetadata; + } + + /** + * Convert a Spark DataFrame to a SystemML binary-block representation. + * + * @param dataFrame + * the Spark DataFrame + */ + public Matrix(Dataset<Row> dataFrame) { + this(dataFrame, new MatrixMetadata()); } /** @@ -146,12 +207,37 @@ public class Matrix { } /** - * Obtain the matrix as a {@code BinaryBlockMatrix} + * Obtain the matrix as a {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} * - * @return the matrix as a {@code BinaryBlockMatrix} + * @return the matrix as a {@code JavaPairRDD<MatrixIndexes, MatrixBlock>} */ - public BinaryBlockMatrix toBinaryBlockMatrix() { - return MLContextConversionUtil.matrixObjectToBinaryBlockMatrix(matrixObject, sparkExecutionContext); + public JavaPairRDD<MatrixIndexes, MatrixBlock> toBinaryBlocks() { + if (binaryBlocks != null) { + return binaryBlocks; + } else if (matrixObject != null) { + binaryBlocks = MLContextConversionUtil.matrixObjectToBinaryBlocks(matrixObject, sparkExecutionContext); + MatrixCharacteristics mc = matrixObject.getMatrixCharacteristics(); + matrixMetadata = new MatrixMetadata(mc); + return binaryBlocks; + } + throw new MLContextException("No binary blocks or MatrixObject found"); + } + + /** + * Obtain the matrix as a {@code MatrixBlock} + * + * @return the matrix as a {@code MatrixBlock} + */ + public MatrixBlock toMatrixBlock() { + if (matrixMetadata == null) { + throw new MLContextException("Matrix metadata required to convert binary blocks to a MatrixBlock."); + } + if (binaryBlocks != null) { + return MLContextConversionUtil.binaryBlocksToMatrixBlock(binaryBlocks, matrixMetadata); + } else if (matrixObject != null) { + return MLContextConversionUtil.binaryBlocksToMatrixBlock(toBinaryBlocks(), matrixMetadata); + } + throw new MLContextException("No binary blocks or MatrixObject found"); } /** @@ -160,11 +246,44 @@ public class Matrix { * @return the matrix metadata */ public MatrixMetadata getMatrixMetadata() { - return new MatrixMetadata(matrixObject.getMatrixCharacteristics()); + return matrixMetadata; } + /** + * If {@code MatrixObject} is available, output + * {@code MatrixObject.toString()}. If {@code MatrixObject} is not available + * but {@code MatrixMetadata} is available, output + * {@code MatrixMetadata.toString()}. Otherwise output + * {@code Object.toString()}. + */ @Override public String toString() { - return matrixObject.toString(); + if (matrixObject != null) { + return matrixObject.toString(); + } else if (matrixMetadata != null) { + return matrixMetadata.toString(); + } else { + return super.toString(); + } + } + + /** + * Whether or not this matrix contains data as binary blocks + * + * @return {@code true} if data as binary blocks are present, {@code false} + * otherwise. + */ + public boolean hasBinaryBlocks() { + return (binaryBlocks != null); + } + + /** + * Whether or not this matrix contains data as a MatrixObject + * + * @return {@code true} if data as binary blocks are present, {@code false} + * otherwise. + */ + public boolean hasMatrixObject() { + return (matrixObject != null); } } http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/python/systemml/mlcontext.py ---------------------------------------------------------------------- diff --git a/src/main/python/systemml/mlcontext.py b/src/main/python/systemml/mlcontext.py index fc6d75c..0eeb981 100644 --- a/src/main/python/systemml/mlcontext.py +++ b/src/main/python/systemml/mlcontext.py @@ -174,7 +174,7 @@ class Matrix(object): NumPy Array A NumPy Array representing the Matrix object. """ - np_array = convertToNumPyArr(self._sc, self._java_matrix.toBinaryBlockMatrix().getMatrixBlock()) + np_array = convertToNumPyArr(self._sc, self._java_matrix.toMatrixBlock()) return np_array http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala b/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala index 7fb3e17..f9b7ecc 100644 --- a/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala +++ b/src/main/scala/org/apache/sysml/api/dl/Caffe2DML.scala @@ -507,8 +507,8 @@ class Caffe2DMLModel(val mloutput: MLResults, val script = dml(predictionScript).out("Prob").in(estimator.inputs) if(mloutput != null) { // fit was called - net.getLayers.map(net.getCaffeLayer(_)).filter(_.weight != null).map(l => script.in(l.weight, mloutput.getBinaryBlockMatrix(l.weight))) - net.getLayers.map(net.getCaffeLayer(_)).filter(_.bias != null).map(l => script.in(l.bias, mloutput.getBinaryBlockMatrix(l.bias))) + net.getLayers.map(net.getCaffeLayer(_)).filter(_.weight != null).map(l => script.in(l.weight, mloutput.getMatrix(l.weight))) + net.getLayers.map(net.getCaffeLayer(_)).filter(_.bias != null).map(l => script.in(l.bias, mloutput.getMatrix(l.bias))) } (script, "X_full") } http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala index 2ea305b..918a48d 100644 --- a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala +++ b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLClassifier.scala @@ -182,7 +182,7 @@ trait BaseSystemMLEstimatorModel extends BaseSystemMLEstimatorOrModel { } val script = dml(dmlScript.toString) for(varName <- modelVariables) { - script.in(varName, baseEstimator.mloutput.getBinaryBlockMatrix(varName)) + script.in(varName, baseEstimator.mloutput.getMatrix(varName)) } val ml = new MLContext(sc) ml.execute(script) @@ -208,7 +208,8 @@ trait BaseSystemMLClassifier extends BaseSystemMLEstimator { val revLabelMapping = new java.util.HashMap[Int, String] val yin = df.select("label") val ret = getTrainingScript(isSingleNode) - val Xbin = new BinaryBlockMatrix(Xin, mcXin) + val mmXin = new MatrixMetadata(mcXin) + val Xbin = new Matrix(Xin, mmXin) val script = ret._1.in(ret._2, Xbin).in(ret._3, yin) ml.execute(script) } @@ -225,7 +226,7 @@ trait BaseSystemMLClassifierModel extends BaseSystemMLEstimatorModel { // ml.setExplainLevel(ExplainLevel.RECOMPILE_RUNTIME) val modelPredict = ml.execute(script._1.in(script._2, X, new MatrixMetadata(X.getNumRows, X.getNumColumns, X.getNonZeros))) val ret = PredictionUtils.computePredictedClassLabelsFromProbability(modelPredict, isSingleNode, sc, probVar) - .getBinaryBlockMatrix("Prediction").getMatrixBlock + .getMatrix("Prediction").toMatrixBlock if(ret.getNumColumns != 1) { throw new RuntimeException("Expected predicted label to be a column vector") @@ -241,7 +242,8 @@ trait BaseSystemMLClassifierModel extends BaseSystemMLEstimatorModel { val mcXin = new MatrixCharacteristics() val Xin = RDDConverterUtils.dataFrameToBinaryBlock(df.rdd.sparkContext, df.asInstanceOf[DataFrame].select("features"), mcXin, false, true) val script = getPredictionScript(isSingleNode) - val Xin_bin = new BinaryBlockMatrix(Xin, mcXin) + val mmXin = new MatrixMetadata(mcXin) + val Xin_bin = new Matrix(Xin, mmXin) val modelPredict = ml.execute(script._1.in(script._2, Xin_bin)) val predLabelOut = PredictionUtils.computePredictedClassLabelsFromProbability(modelPredict, isSingleNode, sc, probVar) val predictedDF = predLabelOut.getDataFrame("Prediction").select(RDDConverterUtils.DF_ID_COLUMN, "C1").withColumnRenamed("C1", "prediction") http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLRegressor.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLRegressor.scala b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLRegressor.scala index 9e2a34a..5610bf3 100644 --- a/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLRegressor.scala +++ b/src/main/scala/org/apache/sysml/api/ml/BaseSystemMLRegressor.scala @@ -52,7 +52,8 @@ trait BaseSystemMLRegressor extends BaseSystemMLEstimator { val Xin = RDDConverterUtils.dataFrameToBinaryBlock(sc, df.asInstanceOf[DataFrame], mcXin, false, true) val yin = df.select("label") val ret = getTrainingScript(isSingleNode) - val Xbin = new BinaryBlockMatrix(Xin, mcXin) + val mmXin = new MatrixMetadata(mcXin) + val Xbin = new Matrix(Xin, mmXin) val script = ret._1.in(ret._2, Xbin).in(ret._3, yin) ml.execute(script) } @@ -66,7 +67,7 @@ trait BaseSystemMLRegressorModel extends BaseSystemMLEstimatorModel { updateML(ml) val script = getPredictionScript(isSingleNode) val modelPredict = ml.execute(script._1.in(script._2, X)) - val ret = modelPredict.getBinaryBlockMatrix(predictionVar).getMatrixBlock + val ret = modelPredict.getMatrix(predictionVar).toMatrixBlock if(ret.getNumColumns != 1) { throw new RuntimeException("Expected prediction to be a column vector") @@ -81,7 +82,8 @@ trait BaseSystemMLRegressorModel extends BaseSystemMLEstimatorModel { val mcXin = new MatrixCharacteristics() val Xin = RDDConverterUtils.dataFrameToBinaryBlock(df.rdd.sparkContext, df.asInstanceOf[DataFrame], mcXin, false, true) val script = getPredictionScript(isSingleNode) - val Xin_bin = new BinaryBlockMatrix(Xin, mcXin) + val mmXin = new MatrixMetadata(mcXin) + val Xin_bin = new Matrix(Xin, mmXin) val modelPredict = ml.execute(script._1.in(script._2, Xin_bin)) val predictedDF = modelPredict.getDataFrame(predictionVar).select(RDDConverterUtils.DF_ID_COLUMN, "C1").withColumnRenamed("C1", "prediction") val dataset = RDDConverterUtilsExt.addIDToDataFrame(df.asInstanceOf[DataFrame], df.sparkSession, RDDConverterUtils.DF_ID_COLUMN) http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/LinearRegression.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/LinearRegression.scala b/src/main/scala/org/apache/sysml/api/ml/LinearRegression.scala index 463d81a..ac6c22c 100644 --- a/src/main/scala/org/apache/sysml/api/ml/LinearRegression.scala +++ b/src/main/scala/org/apache/sysml/api/ml/LinearRegression.scala @@ -99,7 +99,7 @@ class LinearRegressionModel(override val uid: String)(estimator:LinearRegression } def getPredictionScript(isSingleNode:Boolean): (Script, String) = - PredictionUtils.getGLMPredictionScript(estimator.mloutput.getBinaryBlockMatrix("beta_out"), isSingleNode) + PredictionUtils.getGLMPredictionScript(estimator.mloutput.getMatrix("beta_out"), isSingleNode) def modelVariables():List[String] = List[String]("beta_out") http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/LogisticRegression.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/LogisticRegression.scala b/src/main/scala/org/apache/sysml/api/ml/LogisticRegression.scala index f4b5afe..1c368c1 100644 --- a/src/main/scala/org/apache/sysml/api/ml/LogisticRegression.scala +++ b/src/main/scala/org/apache/sysml/api/ml/LogisticRegression.scala @@ -103,7 +103,7 @@ class LogisticRegressionModel(override val uid: String)( this("model")(estimator, estimator.sc) } def getPredictionScript(isSingleNode:Boolean): (Script, String) = - PredictionUtils.getGLMPredictionScript(estimator.mloutput.getBinaryBlockMatrix("B_out"), isSingleNode, 3) + PredictionUtils.getGLMPredictionScript(estimator.mloutput.getMatrix("B_out"), isSingleNode, 3) def baseEstimator():BaseSystemMLEstimator = estimator def modelVariables():List[String] = List[String]("B_out") http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/NaiveBayes.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/NaiveBayes.scala b/src/main/scala/org/apache/sysml/api/ml/NaiveBayes.scala index b2e967b..bc4e77d 100644 --- a/src/main/scala/org/apache/sysml/api/ml/NaiveBayes.scala +++ b/src/main/scala/org/apache/sysml/api/ml/NaiveBayes.scala @@ -95,15 +95,15 @@ class NaiveBayesModel(override val uid: String) .in("$probabilities", " ") .out("probs") - val classPrior = estimator.mloutput.getBinaryBlockMatrix("classPrior") - val classConditionals = estimator.mloutput.getBinaryBlockMatrix("classConditionals") + val classPrior = estimator.mloutput.getMatrix("classPrior") + val classConditionals = estimator.mloutput.getMatrix("classConditionals") val ret = if(isSingleNode) { - script.in("prior", classPrior.getMatrixBlock, classPrior.getMatrixMetadata) - .in("conditionals", classConditionals.getMatrixBlock, classConditionals.getMatrixMetadata) + script.in("prior", classPrior.toMatrixBlock, classPrior.getMatrixMetadata) + .in("conditionals", classConditionals.toMatrixBlock, classConditionals.getMatrixMetadata) } else { - script.in("prior", classPrior.getBinaryBlocks, classPrior.getMatrixMetadata) - .in("conditionals", classConditionals.getBinaryBlocks, classConditionals.getMatrixMetadata) + script.in("prior", classPrior.toBinaryBlocks, classPrior.getMatrixMetadata) + .in("conditionals", classConditionals.toBinaryBlocks, classConditionals.getMatrixMetadata) } (ret, "D") } http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/PredictionUtils.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/PredictionUtils.scala b/src/main/scala/org/apache/sysml/api/ml/PredictionUtils.scala index 585339f..3406169 100644 --- a/src/main/scala/org/apache/sysml/api/ml/PredictionUtils.scala +++ b/src/main/scala/org/apache/sysml/api/ml/PredictionUtils.scala @@ -30,18 +30,18 @@ import org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtils import org.apache.sysml.api.mlcontext.MLResults import org.apache.sysml.api.mlcontext.ScriptFactory._ import org.apache.sysml.api.mlcontext.Script -import org.apache.sysml.api.mlcontext.BinaryBlockMatrix +import org.apache.sysml.api.mlcontext.Matrix object PredictionUtils { - def getGLMPredictionScript(B_full: BinaryBlockMatrix, isSingleNode:Boolean, dfam:java.lang.Integer=1): (Script, String) = { + def getGLMPredictionScript(B_full: Matrix, isSingleNode:Boolean, dfam:java.lang.Integer=1): (Script, String) = { val script = dml(ScriptsUtils.getDMLScript(LogisticRegressionModel.scriptPath)) .in("$X", " ") .in("$B", " ") .in("$dfam", dfam) .out("means") val ret = if(isSingleNode) { - script.in("B_full", B_full.getMatrixBlock, B_full.getMatrixMetadata) + script.in("B_full", B_full.toMatrixBlock, B_full.getMatrixMetadata) } else { script.in("B_full", B_full) @@ -61,9 +61,9 @@ object PredictionUtils { Prediction = rowIndexMax(Prob); # assuming one-based label mapping write(Prediction, "tempOut", "csv"); """).out("Prediction") - val probVar = mlscoreoutput.getBinaryBlockMatrix(inProbVar) + val probVar = mlscoreoutput.getMatrix(inProbVar) if(isSingleNode) { - ml.execute(script.in("Prob", probVar.getMatrixBlock, probVar.getMatrixMetadata)) + ml.execute(script.in("Prob", probVar.toMatrixBlock, probVar.getMatrixMetadata)) } else { ml.execute(script.in("Prob", probVar)) http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/main/scala/org/apache/sysml/api/ml/SVM.scala ---------------------------------------------------------------------- diff --git a/src/main/scala/org/apache/sysml/api/ml/SVM.scala b/src/main/scala/org/apache/sysml/api/ml/SVM.scala index d706101..256bd77 100644 --- a/src/main/scala/org/apache/sysml/api/ml/SVM.scala +++ b/src/main/scala/org/apache/sysml/api/ml/SVM.scala @@ -103,11 +103,11 @@ class SVMModel (override val uid: String)(estimator:SVM, val sc: SparkContext, v .in("$model", " ") .out("scores") - val w = estimator.mloutput.getBinaryBlockMatrix("w") + val w = estimator.mloutput.getMatrix("w") val wVar = if(isMultiClass) "W" else "w" val ret = if(isSingleNode) { - script.in(wVar, w.getMatrixBlock, w.getMatrixMetadata) + script.in(wVar, w.toMatrixBlock, w.getMatrixMetadata) } else { script.in(wVar, w) http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/test/java/org/apache/sysml/test/integration/functions/mlcontext/DataFrameVectorScriptTest.java ---------------------------------------------------------------------- diff --git a/src/test/java/org/apache/sysml/test/integration/functions/mlcontext/DataFrameVectorScriptTest.java b/src/test/java/org/apache/sysml/test/integration/functions/mlcontext/DataFrameVectorScriptTest.java index 0f3d3b2..4ae22cd 100644 --- a/src/test/java/org/apache/sysml/test/integration/functions/mlcontext/DataFrameVectorScriptTest.java +++ b/src/test/java/org/apache/sysml/test/integration/functions/mlcontext/DataFrameVectorScriptTest.java @@ -6,9 +6,9 @@ * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at - * + * * http://www.apache.org/licenses/LICENSE-2.0 - * + * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -57,7 +57,7 @@ import org.junit.BeforeClass; import org.junit.Test; -public class DataFrameVectorScriptTest extends AutomatedTestBase +public class DataFrameVectorScriptTest extends AutomatedTestBase { private final static String TEST_DIR = "functions/mlcontext/"; private final static String TEST_NAME = "DataFrameConversion"; @@ -68,7 +68,7 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase private final static ValueType[] schemaDoubles = new ValueType[]{ValueType.DOUBLE, ValueType.DOUBLE, ValueType.OBJECT, ValueType.DOUBLE}; private final static ValueType[] schemaMixed1 = new ValueType[]{ValueType.OBJECT, ValueType.INT, ValueType.STRING, ValueType.DOUBLE, ValueType.INT}; private final static ValueType[] schemaMixed2 = new ValueType[]{ValueType.STRING, ValueType.OBJECT, ValueType.DOUBLE}; - + private final static int rows1 = 2245; private final static int colsVector = 7; private final static double sparsity1 = 0.9; @@ -94,37 +94,37 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase public void testVectorStringsConversionIDDenseUnknown() { testDataFrameScriptInput(schemaStrings, true, false, true); } - + @Test public void testVectorDoublesConversionIDDenseUnknown() { testDataFrameScriptInput(schemaDoubles, true, false, true); } - + @Test public void testVectorMixed1ConversionIDDenseUnknown() { testDataFrameScriptInput(schemaMixed1, true, false, true); } - + @Test public void testVectorMixed2ConversionIDDenseUnknown() { testDataFrameScriptInput(schemaMixed2, true, false, true); } - + @Test public void testVectorStringsConversionIDDense() { testDataFrameScriptInput(schemaStrings, true, false, false); } - + @Test public void testVectorDoublesConversionIDDense() { testDataFrameScriptInput(schemaDoubles, true, false, false); } - + @Test public void testVectorMixed1ConversionIDDense() { testDataFrameScriptInput(schemaMixed1, true, false, false); } - + @Test public void testVectorMixed2ConversionIDDense() { testDataFrameScriptInput(schemaMixed2, true, false, false); @@ -134,37 +134,37 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase public void testVectorStringsConversionIDSparseUnknown() { testDataFrameScriptInput(schemaStrings, true, true, true); } - + @Test public void testVectorDoublesConversionIDSparseUnknown() { testDataFrameScriptInput(schemaDoubles, true, true, true); } - + @Test public void testVectorMixed1ConversionIDSparseUnknown() { testDataFrameScriptInput(schemaMixed1, true, true, true); } - + @Test public void testVectorMixed2ConversionIDSparseUnknown() { testDataFrameScriptInput(schemaMixed2, true, true, true); } - + @Test public void testVectorStringsConversionIDSparse() { testDataFrameScriptInput(schemaStrings, true, true, false); } - + @Test public void testVectorDoublesConversionIDSparse() { testDataFrameScriptInput(schemaDoubles, true, true, false); } - + @Test public void testVectorMixed1ConversionIDSparse() { testDataFrameScriptInput(schemaMixed1, true, true, false); } - + @Test public void testVectorMixed2ConversionIDSparse() { testDataFrameScriptInput(schemaMixed2, true, true, false); @@ -174,37 +174,37 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase public void testVectorStringsConversionDenseUnknown() { testDataFrameScriptInput(schemaStrings, false, false, true); } - + @Test public void testVectorDoublesConversionDenseUnknown() { testDataFrameScriptInput(schemaDoubles, false, false, true); } - + @Test public void testVectorMixed1ConversionDenseUnknown() { testDataFrameScriptInput(schemaMixed1, false, false, true); } - + @Test public void testVectorMixed2ConversionDenseUnknown() { testDataFrameScriptInput(schemaMixed2, false, false, true); } - + @Test public void testVectorStringsConversionDense() { testDataFrameScriptInput(schemaStrings, false, false, false); } - + @Test public void testVectorDoublesConversionDense() { testDataFrameScriptInput(schemaDoubles, false, false, false); } - + @Test public void testVectorMixed1ConversionDense() { testDataFrameScriptInput(schemaMixed1, false, false, false); } - + @Test public void testVectorMixed2ConversionDense() { testDataFrameScriptInput(schemaMixed2, false, false, false); @@ -214,51 +214,51 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase public void testVectorStringsConversionSparseUnknown() { testDataFrameScriptInput(schemaStrings, false, true, true); } - + @Test public void testVectorDoublesConversionSparseUnknown() { testDataFrameScriptInput(schemaDoubles, false, true, true); } - + @Test public void testVectorMixed1ConversionSparseUnknown() { testDataFrameScriptInput(schemaMixed1, false, true, true); } - + @Test public void testVectorMixed2ConversionSparseUnknown() { testDataFrameScriptInput(schemaMixed2, false, true, true); } - + @Test public void testVectorStringsConversionSparse() { testDataFrameScriptInput(schemaStrings, false, true, false); } - + @Test public void testVectorDoublesConversionSparse() { testDataFrameScriptInput(schemaDoubles, false, true, false); } - + @Test public void testVectorMixed1ConversionSparse() { testDataFrameScriptInput(schemaMixed1, false, true, false); } - + @Test public void testVectorMixed2ConversionSparse() { testDataFrameScriptInput(schemaMixed2, false, true, false); } private void testDataFrameScriptInput(ValueType[] schema, boolean containsID, boolean dense, boolean unknownDims) { - + //TODO fix inconsistency ml context vs jmlc register Xf try { //generate input data and setup metadata int cols = schema.length + colsVector - 1; - double sparsity = dense ? sparsity1 : sparsity2; - double[][] A = TestUtils.round(getRandomMatrix(rows1, cols, -10, 1000, sparsity, 2373)); + double sparsity = dense ? sparsity1 : sparsity2; + double[][] A = TestUtils.round(getRandomMatrix(rows1, cols, -10, 1000, sparsity, 2373)); MatrixBlock mbA = DataConverter.convertToMatrixBlock(A); int blksz = ConfigurationManager.getBlocksize(); MatrixCharacteristics mc1 = new MatrixCharacteristics(rows1, cols, blksz, blksz, mbA.getNonZeros()); @@ -281,12 +281,8 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase .in("Xf", df, metaEmpty).out("Xm"); // empty metadata Matrix Xm1 = ml.execute(script1).getMatrix("Xm"); Matrix Xm2 = ml.execute(script2).getMatrix("Xm"); - MatrixBlock mbB1 = Xm1.toBinaryBlockMatrix().getMatrixBlock(); - MatrixBlock mbB2 = Xm2.toBinaryBlockMatrix().getMatrixBlock(); - - //compare frame blocks - double[][] B1 = DataConverter.convertToDoubleMatrix(mbB1); - double[][] B2 = DataConverter.convertToDoubleMatrix(mbB2); + double[][] B1 = Xm1.to2DDoubleArray(); + double[][] B2 = Xm2.to2DDoubleArray(); TestUtils.compareMatrices(A, B1, rows1, cols, eps); TestUtils.compareMatrices(A, B2, rows1, cols, eps); } @@ -297,14 +293,14 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase } @SuppressWarnings("resource") - private Dataset<Row> createDataFrame(SparkSession sparkSession, MatrixBlock mb, boolean containsID, ValueType[] schema) + private Dataset<Row> createDataFrame(SparkSession sparkSession, MatrixBlock mb, boolean containsID, ValueType[] schema) throws DMLRuntimeException { //create in-memory list of rows - List<Row> list = new ArrayList<Row>(); + List<Row> list = new ArrayList<Row>(); int off = (containsID ? 1 : 0); int clen = mb.getNumColumns() + off - colsVector + 1; - + for( int i=0; i<mb.getNumRows(); i++ ) { Object[] row = new Object[clen]; if( containsID ) @@ -323,11 +319,11 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase } list.add(RowFactory.create(row)); } - + //create data frame schema List<StructField> fields = new ArrayList<StructField>(); if( containsID ) - fields.add(DataTypes.createStructField(RDDConverterUtils.DF_ID_COLUMN, + fields.add(DataTypes.createStructField(RDDConverterUtils.DF_ID_COLUMN, DataTypes.DoubleType, true)); for( int j=0; j<schema.length; j++ ) { DataType dt = null; @@ -341,7 +337,7 @@ public class DataFrameVectorScriptTest extends AutomatedTestBase fields.add(DataTypes.createStructField("C"+(j+1), dt, true)); } StructType dfSchema = DataTypes.createStructType(fields); - + //create rdd and data frame JavaSparkContext sc = new JavaSparkContext(sparkSession.sparkContext()); JavaRDD<Row> rowRDD = sc.parallelize(list); http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextOutputBlocksizeTest.java ---------------------------------------------------------------------- diff --git a/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextOutputBlocksizeTest.java b/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextOutputBlocksizeTest.java index c521173..fbc413b 100644 --- a/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextOutputBlocksizeTest.java +++ b/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextOutputBlocksizeTest.java @@ -6,9 +6,9 @@ * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at - * + * * http://www.apache.org/licenses/LICENSE-2.0 - * + * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -24,10 +24,11 @@ import static org.apache.sysml.api.mlcontext.ScriptFactory.dml; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; -import org.apache.sysml.api.mlcontext.BinaryBlockMatrix; import org.apache.sysml.api.mlcontext.MLContext; import org.apache.sysml.api.mlcontext.MLContext.ExplainLevel; import org.apache.sysml.api.mlcontext.MLResults; +import org.apache.sysml.api.mlcontext.Matrix; +import org.apache.sysml.api.mlcontext.MatrixMetadata; import org.apache.sysml.api.mlcontext.Script; import org.apache.sysml.conf.ConfigurationManager; import org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext; @@ -43,7 +44,7 @@ import org.junit.BeforeClass; import org.junit.Test; -public class MLContextOutputBlocksizeTest extends AutomatedTestBase +public class MLContextOutputBlocksizeTest extends AutomatedTestBase { protected final static String TEST_DIR = "org/apache/sysml/api/mlcontext"; protected final static String TEST_NAME = "MLContext"; @@ -51,7 +52,7 @@ public class MLContextOutputBlocksizeTest extends AutomatedTestBase private final static int rows = 100; private final static int cols = 63; private final static double sparsity = 0.7; - + private static SparkConf conf; private static JavaSparkContext sc; private static MLContext ml; @@ -77,47 +78,47 @@ public class MLContextOutputBlocksizeTest extends AutomatedTestBase public void testOutputBlocksizeTextcell() { runMLContextOutputBlocksizeTest("text"); } - + @Test public void testOutputBlocksizeCSV() { runMLContextOutputBlocksizeTest("csv"); } - + @Test public void testOutputBlocksizeMM() { runMLContextOutputBlocksizeTest("mm"); } - + @Test public void testOutputBlocksizeBinary() { runMLContextOutputBlocksizeTest("binary"); } - - - private void runMLContextOutputBlocksizeTest(String format) + + + private void runMLContextOutputBlocksizeTest(String format) { try { - double[][] A = getRandomMatrix(rows, cols, -10, 10, sparsity, 76543); - MatrixBlock mbA = DataConverter.convertToMatrixBlock(A); + double[][] A = getRandomMatrix(rows, cols, -10, 10, sparsity, 76543); + MatrixBlock mbA = DataConverter.convertToMatrixBlock(A); int blksz = ConfigurationManager.getBlocksize(); MatrixCharacteristics mc = new MatrixCharacteristics(rows, cols, blksz, blksz, mbA.getNonZeros()); - + //create input dataset JavaPairRDD<MatrixIndexes,MatrixBlock> in = SparkExecutionContext.toMatrixJavaPairRDD(sc, mbA, blksz, blksz); - BinaryBlockMatrix bbmatrix = new BinaryBlockMatrix(in, mc); + Matrix m = new Matrix(in, new MatrixMetadata(mc)); ml.setExplain(true); ml.setExplainLevel(ExplainLevel.HOPS); - + //execute script String s ="if( sum(X) > 0 )" + " X = X/2;" + "R = X;" + "write(R, \"/tmp\", format=\""+format+"\");"; - Script script = dml(s).in("X", bbmatrix).out("R"); + Script script = dml(s).in("X", m).out("R"); MLResults results = ml.execute(script); - + //compare output matrix characteristics MatrixCharacteristics mcOut = results.getMatrix("R") .getMatrixMetadata().asMatrixCharacteristics(); http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/1173a5cb/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextTest.java ---------------------------------------------------------------------- diff --git a/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextTest.java b/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextTest.java index c1d44b7..852df72 100644 --- a/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextTest.java +++ b/src/test/java/org/apache/sysml/test/integration/mlcontext/MLContextTest.java @@ -6,9 +6,9 @@ * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at - * + * * http://www.apache.org/licenses/LICENSE-2.0 - * + * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -43,6 +43,7 @@ import java.util.HashMap; import java.util.List; import java.util.Map; +import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; @@ -57,18 +58,21 @@ import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.types.DataTypes; import org.apache.spark.sql.types.StructField; import org.apache.spark.sql.types.StructType; -import org.apache.sysml.api.mlcontext.BinaryBlockMatrix; import org.apache.sysml.api.mlcontext.MLContext; import org.apache.sysml.api.mlcontext.MLContextConversionUtil; import org.apache.sysml.api.mlcontext.MLContextException; import org.apache.sysml.api.mlcontext.MLContextUtil; import org.apache.sysml.api.mlcontext.MLResults; +import org.apache.sysml.api.mlcontext.Matrix; import org.apache.sysml.api.mlcontext.MatrixFormat; import org.apache.sysml.api.mlcontext.MatrixMetadata; import org.apache.sysml.api.mlcontext.Script; import org.apache.sysml.api.mlcontext.ScriptExecutor; import org.apache.sysml.runtime.controlprogram.caching.MatrixObject; import org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtils; +import org.apache.sysml.runtime.matrix.MatrixCharacteristics; +import org.apache.sysml.runtime.matrix.data.MatrixBlock; +import org.apache.sysml.runtime.matrix.data.MatrixIndexes; import org.apache.sysml.test.integration.AutomatedTestBase; import org.junit.After; import org.junit.AfterClass; @@ -1650,8 +1654,8 @@ public class MLContextTest extends AutomatedTestBase { } @Test - public void testInputBinaryBlockMatrixDML() { - System.out.println("MLContextTest - input BinaryBlockMatrix DML"); + public void testInputMatrixBlockDML() { + System.out.println("MLContextTest - input MatrixBlock DML"); List<String> list = new ArrayList<String>(); list.add("10,20,30"); @@ -1667,15 +1671,16 @@ public class MLContextTest extends AutomatedTestBase { StructType schema = DataTypes.createStructType(fields); Dataset<Row> dataFrame = spark.createDataFrame(javaRddRow, schema); - BinaryBlockMatrix binaryBlockMatrix = new BinaryBlockMatrix(dataFrame); - Script script = dml("avg = avg(M);").in("M", binaryBlockMatrix).out("avg"); + Matrix m = new Matrix(dataFrame); + MatrixBlock matrixBlock = m.toMatrixBlock(); + Script script = dml("avg = avg(M);").in("M", matrixBlock).out("avg"); double avg = ml.execute(script).getDouble("avg"); Assert.assertEquals(50.0, avg, 0.0); } @Test - public void testInputBinaryBlockMatrixPYDML() { - System.out.println("MLContextTest - input BinaryBlockMatrix PYDML"); + public void testInputMatrixBlockPYDML() { + System.out.println("MLContextTest - input MatrixBlock PYDML"); List<String> list = new ArrayList<String>(); list.add("10,20,30"); @@ -1691,20 +1696,24 @@ public class MLContextTest extends AutomatedTestBase { StructType schema = DataTypes.createStructType(fields); Dataset<Row> dataFrame = spark.createDataFrame(javaRddRow, schema); - BinaryBlockMatrix binaryBlockMatrix = new BinaryBlockMatrix(dataFrame); - Script script = pydml("avg = avg(M)").in("M", binaryBlockMatrix).out("avg"); + Matrix m = new Matrix(dataFrame); + MatrixBlock matrixBlock = m.toMatrixBlock(); + Script script = pydml("avg = avg(M)").in("M", matrixBlock).out("avg"); double avg = ml.execute(script).getDouble("avg"); Assert.assertEquals(50.0, avg, 0.0); } @Test - public void testOutputBinaryBlockMatrixDML() { - System.out.println("MLContextTest - output BinaryBlockMatrix DML"); + public void testOutputBinaryBlocksDML() { + System.out.println("MLContextTest - output binary blocks DML"); String s = "M = matrix('1 2 3 4', rows=2, cols=2);"; - BinaryBlockMatrix binaryBlockMatrix = ml.execute(dml(s).out("M")).getBinaryBlockMatrix("M"); + MLResults results = ml.execute(dml(s).out("M")); + Matrix m = results.getMatrix("M"); + JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = m.toBinaryBlocks(); + MatrixMetadata mm = m.getMatrixMetadata(); + MatrixCharacteristics mc = mm.asMatrixCharacteristics(); + JavaRDD<String> javaRDDStringIJV = RDDConverterUtils.binaryBlockToTextCell(binaryBlocks, mc); - JavaRDD<String> javaRDDStringIJV = MLContextConversionUtil - .binaryBlockMatrixToJavaRDDStringIJV(binaryBlockMatrix); List<String> lines = javaRDDStringIJV.collect(); Assert.assertEquals("1 1 1.0", lines.get(0)); Assert.assertEquals("1 2 2.0", lines.get(1)); @@ -1713,13 +1722,16 @@ public class MLContextTest extends AutomatedTestBase { } @Test - public void testOutputBinaryBlockMatrixPYDML() { - System.out.println("MLContextTest - output BinaryBlockMatrix PYDML"); + public void testOutputBinaryBlocksPYDML() { + System.out.println("MLContextTest - output binary blocks PYDML"); String s = "M = full('1 2 3 4', rows=2, cols=2);"; - BinaryBlockMatrix binaryBlockMatrix = ml.execute(pydml(s).out("M")).getBinaryBlockMatrix("M"); + MLResults results = ml.execute(pydml(s).out("M")); + Matrix m = results.getMatrix("M"); + JavaPairRDD<MatrixIndexes, MatrixBlock> binaryBlocks = m.toBinaryBlocks(); + MatrixMetadata mm = m.getMatrixMetadata(); + MatrixCharacteristics mc = mm.asMatrixCharacteristics(); + JavaRDD<String> javaRDDStringIJV = RDDConverterUtils.binaryBlockToTextCell(binaryBlocks, mc); - JavaRDD<String> javaRDDStringIJV = MLContextConversionUtil - .binaryBlockMatrixToJavaRDDStringIJV(binaryBlockMatrix); List<String> lines = javaRDDStringIJV.collect(); Assert.assertEquals("1 1 1.0", lines.get(0)); Assert.assertEquals("1 2 2.0", lines.get(1));
