This is an automated email from the ASF dual-hosted git repository. janardhan pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/main by this push: new 16a2855 [SYSTEMDS-439] Builtins for set operations on vectors 16a2855 is described below commit 16a2855295a086d5ca839448df9b3022ed7366fc Author: David Fleischhacker <david.fleischhac...@student.tugraz.at> AuthorDate: Tue Dec 14 09:27:58 2021 +0530 [SYSTEMDS-439] Builtins for set operations on vectors Illustration of the implementation: X = matrix("1 2 3.1 4", rows=2, cols=2) Y = matrix("3.1 4 5 6", rows=2, cols=2) union(X, Y): Union of the sets X and Y 1 2 3.1 4 5 6 setdiff(X, Y): Set difference between X, and Y, with elements in X but not in Y. 1 2 symmetricDifference(X,Y): Set difference between X, and Y, with elements in X and Y but not in both. 1 2 5 6 unique(X): Unique elements of the set X 1 2 3.1 4 Future work: also to support string elements. These operations are helpful for bridging the gap between the relational and linear algebra. Resolves SYSTEMDS-440, SYSTEMDS-441, SYSTEMDS-442, SYSTEMDS-3183. Closes #1479. Co-authored-by: David Fleischhacker <david.fleischhac...@student.tugraz.at> Co-authored-by: Joachim Dunkel <dun...@student.tugraz.at> --- docs/site/builtins-reference.md | 125 +++++++++++++++ scripts/builtin/intersect.dml | 16 +- scripts/builtin/{intersect.dml => setdiff.dml} | 33 ++-- .../{intersect.dml => symmetricDifference.dml} | 26 ++- scripts/builtin/{intersect.dml => union.dml} | 27 ++-- scripts/builtin/{intersect.dml => unique.dml} | 32 ++-- .../java/org/apache/sysds/common/Builtins.java | 4 + .../test/functions/builtin/BuiltinUniqueTest.java | 114 ++++++++++++++ .../builtin/part1/BuiltinIntersectionTest.java | 104 ------------ .../setoperations/BuiltinIntersectionTest.java | 41 +++++ .../builtin/setoperations/BuiltinSetDiffTest.java | 36 +++++ .../BuiltinSymmetricDifferenceTest.java | 32 ++++ .../builtin/setoperations/BuiltinUnionTest.java | 39 +++++ .../setoperations/SetOperationsTestBase.java | 175 +++++++++++++++++++++ .../builtin/{intersection.dml => intersect.R} | 12 +- .../builtin/{intersection.dml => intersect.dml} | 0 .../builtin/{intersection.dml => setdiff.R} | 12 +- .../builtin/{intersection.dml => setdiff.dml} | 8 +- .../{intersection.dml => symmetricDifference.R} | 15 +- .../{intersection.dml => symmetricDifference.dml} | 8 +- .../builtin/{intersection.dml => union.R} | 13 +- .../builtin/{intersection.dml => union.dml} | 8 +- .../builtin/{intersection.dml => unique.R} | 10 +- .../builtin/{intersection.dml => unique.dml} | 7 +- 24 files changed, 686 insertions(+), 211 deletions(-) diff --git a/docs/site/builtins-reference.md b/docs/site/builtins-reference.md index da70c49..9c9bb32 100644 --- a/docs/site/builtins-reference.md +++ b/docs/site/builtins-reference.md @@ -71,15 +71,19 @@ limitations under the License. * [`outlier`-Function](#outlier-function) * [`pnmf`-Function](#pnmf-function) * [`scale`-Function](#scale-function) + * [`setdiff`-Function](#setdiff-function) * [`sherlock`-Function](#sherlock-function) * [`sherlockPredict`-Function](#sherlockPredict-function) * [`sigmoid`-Function](#sigmoid-function) * [`slicefinder`-Function](#slicefinder-function) * [`smote`-Function](#smote-function) * [`steplm`-Function](#steplm-function) + * [`symmetricDifference`-Function](#symmetricdifference-function) * [`tomekLink`-Function](#tomekLink-function) * [`toOneHot`-Function](#toOneHOt-function) * [`tSNE`-Function](#tSNE-function) + * [`union`-Function](#union-function) + * [`unique`-Function](#unique-function) * [`winsorize`-Function](#winsorize-function) * [`xgboost`-Function](#xgboost-function) @@ -1823,6 +1827,36 @@ scale=TRUE; Y= scale(X,center,scale) ``` +## `setdiff`-Function + +The `setdiff`-function returns the values of X that are not in Y. + +### Usage + +```r +setdiff(X, Y) +``` + +### Arguments + +| Name | Type | Default | Description | +| :--- | :----- | -------- | :---------- | +| X | Matrix[Double] | required | input vector| +| Y | Matrix[Double] | required | input vector| + +### Returns + +| Type | Description | +| :----- | :---------- | +| Matrix[Double] | values of X that are not in Y.| + +### Example + +```r +X = matrix("1 2 3 4", rows = 4, cols = 1) +Y = matrix("2 3", rows = 2, cols = 1) +R = setdiff(X = X, Y = Y) +``` ## `sherlock`-Function @@ -2107,6 +2141,37 @@ y = X %*% rand(rows = ncol(X), cols = 1) [C, S] = steplm(X = X, y = y, icpt = 1); ``` +## `symmetricDifference`-Function + +The `symmetricDifference`-function returns the symmetric difference of the two input vectors. +This is done by calculating the `setdiff` (nonsymmetric) between `union` and `intersect` of the two input vectors. + +### Usage + +```r +symmetricDifference(X, Y) +``` + +### Arguments + +| Name | Type | Default | Description | +| :--- | :----- | -------- | :---------- | +| X | Matrix[Double] | required | input vector| +| Y | Matrix[Double] | required | input vector| + +### Returns + +| Type | Description | +| :----- | :---------- | +| Matrix[Double] | symmetric difference of the input vectors | + +### Example + +```r +X = matrix("1 2 3.1", rows = 3, cols = 1) +Y = matrix("3.1 4", rows = 2, cols = 1) +R = symmetricDifference(X = X, Y = Y) +``` ## `tomekLink`-Function @@ -2212,6 +2277,66 @@ X = rand(rows = 100, cols = 10, min = -10, max = 10)) Y = tSNE(X) ``` +## `union`-Function + +The `union`-function combines all rows from both input vectors and removes all duplicate rows by calling `unique` on the resulting vector. + +### Usage + +```r +union(X, Y) +``` + +### Arguments + +| Name | Type | Default | Description | +| :--- | :----- | -------- | :---------- | +| X | Matrix[Double] | required | input vector| +| Y | Matrix[Double] | required | input vector| + +### Returns + +| Type | Description | +| :----- | :---------- | +| Matrix[Double] | the union of both input vectors.| + +### Example + +```r +X = matrix("1 2 3 4", rows = 4, cols = 1) +Y = matrix("3 4 5 6", rows = 4, cols = 1) +R = union(X = X, Y = Y) +``` + +## `unique`-Function + +The `unique`-function returns a set of unique rows from a given input vector. + +### Usage + +```r +unique(X) +``` + +### Arguments + +| Name | Type | Default | Description | +| :--- | :----- | -------- | :---------- | +| X | Matrix[Double] | required | input vector| + +### Returns + +| Type | Description | +| :----- | :---------- | +| Matrix[Double] | a set of unique values from the input vector | + +### Example + +```r +X = matrix("1 3.4 7 3.4 -0.9 8 1", rows = 7, cols = 1) +R = unique(X = X) +``` + ## `winsorize`-Function The `winsorize`-function removes outliers from the data. It does so by computing upper and lower quartile range diff --git a/scripts/builtin/intersect.dml b/scripts/builtin/intersect.dml index 23edc09..bf16c60 100644 --- a/scripts/builtin/intersect.dml +++ b/scripts/builtin/intersect.dml @@ -39,12 +39,14 @@ m_intersect = function(Matrix[Double] X, Matrix[Double] Y) return(Matrix[Double] R) { - # compute indicator vector of intersection output - X = (table(X, 1) != 0) - Y = (table(Y, 1) != 0) - n = min(nrow(X), nrow(Y)) - I = X[1:n,] * Y[1:n,] + X = unique(X); + Y = unique(Y); - # reconstruct integer values and create output - R = removeEmpty(target=seq(1,n), margin="rows", select=I) + combined = rbind(X, Y); + + combined = order(target=combined, by=1, decreasing=FALSE, index.return=FALSE); + temp = combined[1:nrow(combined)-1,] != combined[2:nrow(combined),]; + mask = rbind(matrix(1, rows = 1, cols = 1), rowSums(temp)); + + R = removeEmpty(target = combined, margin = "rows", select = !mask); } diff --git a/scripts/builtin/intersect.dml b/scripts/builtin/setdiff.dml similarity index 62% copy from scripts/builtin/intersect.dml copy to scripts/builtin/setdiff.dml index 23edc09..72b01f8 100644 --- a/scripts/builtin/intersect.dml +++ b/scripts/builtin/setdiff.dml @@ -19,32 +19,35 @@ # #------------------------------------------------------------- -# Implements set intersection for numeric data - +# Builtin function that implements difference operation on vectors # INPUT PARAMETERS: # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# X Double --- matrix X, set A -# Y Double --- matrix Y, set B +# X Matrix --- input vector +# --------------------------------------------------------------------------------------------- +# Y Matrix --- input vector # --------------------------------------------------------------------------------------------- - + # Output(s) # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# R Double --- intersection matrix, set of intersecting items +# R Matrix --- vector with all elements that are present in X but not in Y + -m_intersect = function(Matrix[Double] X, Matrix[Double] Y) - return(Matrix[Double] R) +setdiff = function(Matrix[double] X, Matrix[double] Y) + return (matrix[double] R) { - # compute indicator vector of intersection output - X = (table(X, 1) != 0) - Y = (table(Y, 1) != 0) - n = min(nrow(X), nrow(Y)) - I = X[1:n,] * Y[1:n,] + common = intersect(X, Y); + X = unique(X); + combined = rbind(X, common); + combined = order(target=combined, by=1, decreasing=FALSE, index.return=FALSE); + temp = combined[1:nrow(combined)-1,] != combined[2:nrow(combined),]; + mask1 = rbind(rowSums(temp), matrix(1, rows=1, cols=1)); + mask2 = rbind(matrix(1, rows = 1, cols = 1), rowSums(temp)); - # reconstruct integer values and create output - R = removeEmpty(target=seq(1,n), margin="rows", select=I) + mask = mask1 & mask2; + R = removeEmpty(target = combined, margin = "rows", select = mask); } diff --git a/scripts/builtin/intersect.dml b/scripts/builtin/symmetricDifference.dml similarity index 72% copy from scripts/builtin/intersect.dml copy to scripts/builtin/symmetricDifference.dml index 23edc09..77f1a92 100644 --- a/scripts/builtin/intersect.dml +++ b/scripts/builtin/symmetricDifference.dml @@ -19,32 +19,26 @@ # #------------------------------------------------------------- -# Implements set intersection for numeric data - +# Builtin function that implements symmetric difference set-operation on vectors # INPUT PARAMETERS: # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# X Double --- matrix X, set A -# Y Double --- matrix Y, set B +# X Matrix --- input vector +# --------------------------------------------------------------------------------------------- +# Y Matrix --- input vector # --------------------------------------------------------------------------------------------- - + # Output(s) # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# R Double --- intersection matrix, set of intersecting items +# R Matrix --- vector with all elements in X and Y but not in both -m_intersect = function(Matrix[Double] X, Matrix[Double] Y) - return(Matrix[Double] R) -{ - # compute indicator vector of intersection output - X = (table(X, 1) != 0) - Y = (table(Y, 1) != 0) - n = min(nrow(X), nrow(Y)) - I = X[1:n,] * Y[1:n,] - # reconstruct integer values and create output - R = removeEmpty(target=seq(1,n), margin="rows", select=I) +symmetricDifference = function(Matrix[Double] X, Matrix[Double] Y) + return (matrix[double] R) +{ + R = setdiff(union(X,Y), intersect(X,Y)) } diff --git a/scripts/builtin/intersect.dml b/scripts/builtin/union.dml similarity index 72% copy from scripts/builtin/intersect.dml copy to scripts/builtin/union.dml index 23edc09..fa3609b 100644 --- a/scripts/builtin/intersect.dml +++ b/scripts/builtin/union.dml @@ -19,32 +19,27 @@ # #------------------------------------------------------------- -# Implements set intersection for numeric data - +# Builtin function that implements union operation on vectors # INPUT PARAMETERS: # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# X Double --- matrix X, set A -# Y Double --- matrix Y, set B +# X Matrix --- input vector +# --------------------------------------------------------------------------------------------- +# Y Matrix --- input vector # --------------------------------------------------------------------------------------------- - + # Output(s) # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# R Double --- intersection matrix, set of intersecting items +# R Matrix --- matrix with all unique rows existing in X and Y -m_intersect = function(Matrix[Double] X, Matrix[Double] Y) - return(Matrix[Double] R) -{ - # compute indicator vector of intersection output - X = (table(X, 1) != 0) - Y = (table(Y, 1) != 0) - n = min(nrow(X), nrow(Y)) - I = X[1:n,] * Y[1:n,] - # reconstruct integer values and create output - R = removeEmpty(target=seq(1,n), margin="rows", select=I) +union = function(Matrix[Double] X, Matrix[Double] Y) + return (matrix[double] R) +{ + combined = rbind(X,Y); + R = unique(combined); } diff --git a/scripts/builtin/intersect.dml b/scripts/builtin/unique.dml similarity index 71% copy from scripts/builtin/intersect.dml copy to scripts/builtin/unique.dml index 23edc09..94699be 100644 --- a/scripts/builtin/intersect.dml +++ b/scripts/builtin/unique.dml @@ -19,32 +19,30 @@ # #------------------------------------------------------------- -# Implements set intersection for numeric data - +# Builtin function that implements unique operation on vectors # INPUT PARAMETERS: # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# X Double --- matrix X, set A -# Y Double --- matrix Y, set B +# X Matrix --- input vector # --------------------------------------------------------------------------------------------- - + # Output(s) # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- -# R Double --- intersection matrix, set of intersecting items - -m_intersect = function(Matrix[Double] X, Matrix[Double] Y) - return(Matrix[Double] R) -{ - # compute indicator vector of intersection output - X = (table(X, 1) != 0) - Y = (table(Y, 1) != 0) - n = min(nrow(X), nrow(Y)) - I = X[1:n,] * Y[1:n,] +# R Matrix --- matrix with only unique rows - # reconstruct integer values and create output - R = removeEmpty(target=seq(1,n), margin="rows", select=I) +unique = function(matrix[double] X) + return (matrix[double] R) { + if(nrow(X) > 1) { + X_sorted = order(target=X, by=1, decreasing=FALSE, index.return=FALSE); + temp = X_sorted[1:nrow(X_sorted)-1,] != X_sorted[2:nrow(X_sorted),]; + mask = rbind(matrix(1, rows = 1, cols = 1), rowSums(temp)); + R = removeEmpty(target = X_sorted, margin = "rows", select = mask); + } + else { + R = X + } } diff --git a/src/main/java/org/apache/sysds/common/Builtins.java b/src/main/java/org/apache/sysds/common/Builtins.java index affed9e..f44cf42 100644 --- a/src/main/java/org/apache/sysds/common/Builtins.java +++ b/src/main/java/org/apache/sysds/common/Builtins.java @@ -112,6 +112,7 @@ public enum Builtins { DIAG("diag", false), DISCOVER_FD("discoverFD", true), DISCOVER_MD("mdedup", true), + SETDIFF("setdiff", true), DIST("dist", true), DMV("dmv", true), DROP_INVALID_TYPE("dropInvalidType", false), @@ -238,6 +239,7 @@ public enum Builtins { SD("sd", false), SELVARTHRESH("selectByVarThresh", true), SEQ("seq", false), + SYMMETRICDIFFERENCE("symmetricDifference", true), SHERLOCK("sherlock", true), SHERLOCKPREDICT("sherlockPredict", true), SHORTESTPATH("shortestPath", true), @@ -267,7 +269,9 @@ public enum Builtins { TRANS("t", false), TSNE("tSNE", true), TYPEOF("typeof", false), + UNIQUE("unique", true), UNIVAR("univar", true), + UNION("union", true), VAR("var", false), VALUE_SWAP("valueSwap", false), VECTOR_TO_CSV("vectorToCsv", true), diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/BuiltinUniqueTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/BuiltinUniqueTest.java new file mode 100644 index 0000000..7d36d79 --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/BuiltinUniqueTest.java @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin; + +import org.apache.sysds.common.Types; +import org.apache.sysds.common.Types.ExecType; +import org.apache.sysds.runtime.matrix.data.MatrixValue; +import org.apache.sysds.test.AutomatedTestBase; +import org.apache.sysds.test.TestConfiguration; +import org.apache.sysds.test.TestUtils; +import org.junit.Test; + +import java.util.HashMap; + +public class BuiltinUniqueTest extends AutomatedTestBase { + private final static String TEST_NAME = "unique"; + private final static String TEST_DIR = "functions/builtin/"; + private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinUniqueTest.class.getSimpleName() + "/"; + + @Override + public void setUp() { + TestUtils.clearAssertionInformation(); + addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[] {"R"})); + } + + @Test + public void testUnique1CP() { + double[][] X = {{1},{1},{6},{9},{4},{2},{0},{9},{0},{0},{4},{4}}; + runUniqueTest(X, ExecType.CP); + } + + @Test + public void testUnique1SP() { + double[][] X = {{1},{1},{6},{9},{4},{2},{0},{9},{0},{0},{4},{4}}; + runUniqueTest(X,ExecType.SPARK); + } + + @Test + public void testUnique2CP() { + double[][] X = {{0}}; + runUniqueTest(X, ExecType.CP); + } + + @Test + public void testUnique2SP() { + double[][] X = {{0}}; + runUniqueTest(X, ExecType.SPARK); + } + + @Test + public void testUnique3CP() { + double[][] X = {{1, 2, 3}, {2, 3, 4}, {1, 2, 3}}; + runUniqueTest(X, ExecType.CP); + } + +// @Test +// public void testUnique3SP() { //This fails? +// double[][] X = {{1, 2, 3}, {2, 3, 4}, {1, 2, 3}}; +// runUniqueTest(X, ExecType.SPARK); +// } + + @Test + public void testUnique4CP() { + double[][] X = {{1.5, 2}, {7, 3}, {1, 3}, {1.5, 2}, {-1, -2.32}, {-1, 0.1}, {1, 3}, {-1, 0.1}}; + runUniqueTest(X, ExecType.CP); + } + +// @Test +// public void testUnique4SP() { //This fails? +// double[][] X = {{1.5, 2}, {7, 3}, {1, 3}, {1.5, 2}, {-1, -2.32}, {-1, 0.1}, {1, 3}, {-1, 0.1}}; +// runUniqueTest(X, ExecType.SPARK); +// } + + private void runUniqueTest(double[][] X, ExecType instType) { + Types.ExecMode platformOld = setExecMode(instType); + try { + loadTestConfiguration(getTestConfiguration(TEST_NAME)); + String HOME = SCRIPT_DIR + TEST_DIR; + fullDMLScriptName = HOME + TEST_NAME + ".dml"; + programArgs = new String[]{ "-args", input("X"), output("R")}; + fullRScriptName = HOME + TEST_NAME + ".R"; + rCmd = "Rscript" + " " + fullRScriptName + " " + inputDir() + " " + expectedDir(); + + writeInputMatrixWithMTD("X", X, true); + + runTest(true, false, null, -1); + runRScript(true); + + HashMap<MatrixValue.CellIndex, Double> dmlfile = readDMLMatrixFromOutputDir("R"); + HashMap<MatrixValue.CellIndex, Double> rfile = readRMatrixFromExpectedDir("R"); + TestUtils.compareMatrices(dmlfile, rfile, 1e-10, "dml", "expected"); + } + finally { + rtplatform = platformOld; + } + } +} diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/part1/BuiltinIntersectionTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/part1/BuiltinIntersectionTest.java deleted file mode 100644 index 78415e2..0000000 --- a/src/test/java/org/apache/sysds/test/functions/builtin/part1/BuiltinIntersectionTest.java +++ /dev/null @@ -1,104 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.sysds.test.functions.builtin.part1; - -import org.junit.Test; -import org.apache.sysds.common.Types; -import org.apache.sysds.common.Types.ExecType; -import org.apache.sysds.runtime.matrix.data.MatrixValue.CellIndex; -import org.apache.sysds.test.AutomatedTestBase; -import org.apache.sysds.test.TestConfiguration; -import org.apache.sysds.test.TestUtils; - -import java.util.HashMap; - -public class BuiltinIntersectionTest extends AutomatedTestBase -{ - private final static String TEST_NAME = "intersection"; - private final static String TEST_DIR = "functions/builtin/"; - private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinIntersectionTest.class.getSimpleName() + "/"; - - @Override - public void setUp() { - TestUtils.clearAssertionInformation(); - addTestConfiguration(TEST_NAME,new TestConfiguration(TEST_CLASS_DIR, TEST_NAME,new String[]{"C"})); - } - - @Test - public void testIntersect1CP() { - double[][] X = {{12},{22},{13},{4},{6},{7},{8},{9},{12},{12}}; - double[][] Y = {{1},{2},{11},{12},{13},{18},{20},{21},{12}}; - double[][] expected = {{12},{13}}; - runIntersectTest(X, Y, expected, ExecType.CP); - } - - @Test - public void testIntersect1Spark() { - double[][] X = {{12},{22},{13},{4},{6},{7},{8},{9},{12},{12}}; - double[][] Y = {{1},{2},{11},{12},{13},{18},{20},{21},{12}}; - double[][] expected = {{12},{13}}; - runIntersectTest(X, Y, expected, ExecType.SPARK); - } - - @Test - public void testIntersect2CP() { - double[][] X = TestUtils.seq(2, 200, 4); - double[][] Y = TestUtils.seq(2, 100, 2); - double[][] expected = TestUtils.seq(2, 100, 4); - runIntersectTest(X, Y, expected, ExecType.CP); - } - - @Test - public void testIntersect2Spark() { - double[][] X = TestUtils.seq(2, 200, 4); - double[][] Y = TestUtils.seq(2, 100, 2); - double[][] expected = TestUtils.seq(2, 100, 4); - runIntersectTest(X, Y, expected, ExecType.SPARK); - } - - private void runIntersectTest(double X[][], double Y[][], double[][] expected, ExecType instType) - { - Types.ExecMode platformOld = setExecMode(instType); - - try { - loadTestConfiguration(getTestConfiguration(TEST_NAME)); - String HOME = SCRIPT_DIR + TEST_DIR; - fullDMLScriptName = HOME + TEST_NAME + ".dml"; - programArgs = new String[]{ "-args", input("X"), input("Y"), output("C"), output("X")}; - - //generate actual datasets - writeInputMatrixWithMTD("X", X, true); - writeInputMatrixWithMTD("Y", Y, true); - - //run test - runTest(true, false, null, -1); - - //compare expected results - HashMap<CellIndex, Double> R = new HashMap<>(); - for(int i=0; i<expected.length; i++) - R.put(new CellIndex(i+1,1), expected[i][0]); - HashMap<CellIndex, Double> dmlfile = readDMLMatrixFromOutputDir("C"); - TestUtils.compareMatrices(dmlfile, R, 1e-10, "dml", "expected"); - } - finally { - rtplatform = platformOld; - } - } -} diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinIntersectionTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinIntersectionTest.java new file mode 100644 index 0000000..a78bc5f --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinIntersectionTest.java @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin.setoperations; + +import org.junit.Test; +import org.apache.sysds.common.Types; +import org.apache.sysds.common.Types.ExecType; +import org.apache.sysds.runtime.matrix.data.MatrixValue.CellIndex; +import org.apache.sysds.test.AutomatedTestBase; +import org.apache.sysds.test.TestConfiguration; +import org.apache.sysds.test.TestUtils; + +import java.util.HashMap; + +public class BuiltinIntersectionTest extends SetOperationsTestBase +{ + private final static String TEST_NAME = "intersect"; + private final static String TEST_DIR = "functions/builtin/"; + private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinIntersectionTest.class.getSimpleName() + "/"; + + public BuiltinIntersectionTest(Types.ExecType execType) { + super(TEST_NAME, TEST_DIR, TEST_CLASS_DIR, execType); + } +} diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSetDiffTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSetDiffTest.java new file mode 100644 index 0000000..62a4787 --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSetDiffTest.java @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin.setoperations; + +import org.apache.sysds.common.Types; + +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +@RunWith(Parameterized.class) +public class BuiltinSetDiffTest extends SetOperationsTestBase { + private final static String TEST_NAME = "setdiff"; + private final static String TEST_DIR = "functions/builtin/"; + private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinSetDiffTest.class.getSimpleName() + "/"; + + public BuiltinSetDiffTest(Types.ExecType execType){ + super(TEST_NAME, TEST_DIR, TEST_CLASS_DIR, execType); + } +} \ No newline at end of file diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSymmetricDifferenceTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSymmetricDifferenceTest.java new file mode 100644 index 0000000..09afad2 --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinSymmetricDifferenceTest.java @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin.setoperations; + +import org.apache.sysds.common.Types; + +public class BuiltinSymmetricDifferenceTest extends SetOperationsTestBase { + private final static String TEST_NAME = "symmetricDifference"; + private final static String TEST_DIR = "functions/builtin/"; + private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinSymmetricDifferenceTest.class.getSimpleName() + "/"; + + public BuiltinSymmetricDifferenceTest(Types.ExecType execType) { + super(TEST_NAME, TEST_DIR, TEST_CLASS_DIR, execType); + } +} diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinUnionTest.java b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinUnionTest.java new file mode 100644 index 0000000..8c5c02f --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/BuiltinUnionTest.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin.setoperations; + +import org.apache.sysds.common.Types; +import org.apache.sysds.runtime.matrix.data.MatrixValue; +import org.apache.sysds.test.AutomatedTestBase; +import org.apache.sysds.test.TestConfiguration; +import org.apache.sysds.test.TestUtils; +import org.junit.Test; + +import java.util.HashMap; + +public class BuiltinUnionTest extends SetOperationsTestBase { + private final static String TEST_NAME = "union"; + private final static String TEST_DIR = "functions/builtin/"; + private static final String TEST_CLASS_DIR = TEST_DIR + BuiltinUnionTest.class.getSimpleName() + "/"; + + public BuiltinUnionTest(Types.ExecType execType) { + super(TEST_NAME, TEST_DIR, TEST_CLASS_DIR, execType); + } +} \ No newline at end of file diff --git a/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/SetOperationsTestBase.java b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/SetOperationsTestBase.java new file mode 100644 index 0000000..7639791 --- /dev/null +++ b/src/test/java/org/apache/sysds/test/functions/builtin/setoperations/SetOperationsTestBase.java @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.builtin.setoperations; + +import org.apache.sysds.common.Types; +import org.apache.sysds.runtime.matrix.data.MatrixValue; +import org.apache.sysds.test.AutomatedTestBase; +import org.apache.sysds.test.TestConfiguration; +import org.apache.sysds.test.TestUtils; +import org.junit.Assert; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +import java.util.*; + +@RunWith(Parameterized.class) +public abstract class SetOperationsTestBase extends AutomatedTestBase { + private final String TEST_NAME; + private final String TEST_DIR ; + private final String TEST_CLASS_DIR; + + private final Types.ExecType execType; + + public SetOperationsTestBase(String test_name, String test_dir, String test_class_dir, Types.ExecType execType){ + TEST_NAME = test_name; + TEST_DIR = test_dir; + TEST_CLASS_DIR = test_class_dir; + + this.execType = execType; + } + + @Parameterized.Parameters + public static Collection<Object[]> types(){ + return Arrays.asList(new Object[][]{ + {Types.ExecType.CP}, + {Types.ExecType.SPARK} + }); + } + + @Override + public void setUp() { + TestUtils.clearAssertionInformation(); + addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[]{"R"})); + } + + @Test + public void testPosNumbersAscending() { + double[][] X = {{1}, {2}, {3}}; + double[][] Y = {{2}, {3}, {4}}; + + runUnitTest(X, Y, execType); + } + + @Test + public void testPosNumbersRandomOrder() { + double[][] X = {{9}, {2}, {3}}; + double[][] Y = {{2}, {3}, {4}}; + + runUnitTest(X, Y, execType); + } + + @Test + public void testComplexPosNumbers() { + double[][] X = {{12},{22},{13},{4},{6},{7},{8},{9},{12},{12}}; + double[][] Y = {{1},{2},{11},{12},{13},{18},{20},{21},{12}}; + runUnitTest(X, Y, execType); + } + + @Test + public void testNegNumbers() { + double[][] X = {{-10},{-5},{2}}; + double[][] Y = {{2},{-3}}; + runUnitTest(X, Y, execType); + } + + @Test + public void testFloatingPNumbers() { + double[][] X = {{2},{2.5},{4}}; + double[][] Y = {{2.4},{2}}; + runUnitTest(X, Y, execType); + } + + @Test + public void testNegAndFloating() { + double[][] X = {{1.4}, {-1.3}, {10}, {4}}; + double[][] Y = {{1.3},{-1.4},{10},{9}}; + runUnitTest(X, Y, execType); + } + + @Test + public void testMinValue() { + double[][] X = {{Double.MIN_VALUE}, {2},{4}}; + double[][] Y = {{2},{15}}; + runUnitTest(X, Y, execType); + } + + @Test + public void testCombined() { + double[][] X = {{Double.MIN_VALUE}, {4}, {-1.3}, {10}, {4}}; + double[][] Y = {{Double.MIN_VALUE},{15},{-1.2},{-25.3}}; + runUnitTest(X, Y, execType); + } + +// @Test +// public void testYSuperSetOfX() { +// double[][] X = TestUtils.seq(2, 200, 4); +// double[][] Y = TestUtils.seq(2, 200, 2); +// runUnitTest(X, Y, execType); +// } + + @Test + public void testXSuperSetOfY() { + double[][] X = TestUtils.seq(2, 200, 2); + double[][] Y = TestUtils.seq(2, 200, 4); + runUnitTest(X, Y, execType); + } + + private void runUnitTest(double[][] X, double[][]Y, Types.ExecType instType) { + Types.ExecMode platformOld = setExecMode(instType); + try { + loadTestConfiguration(getTestConfiguration(TEST_NAME)); + String HOME = SCRIPT_DIR + TEST_DIR; + fullDMLScriptName = HOME + TEST_NAME + ".dml"; + programArgs = new String[]{ "-args", input("X"),input("Y"), output("R")}; + fullRScriptName = HOME + TEST_NAME + ".R"; + rCmd = "Rscript" + " " + fullRScriptName + " " + inputDir() + " " + expectedDir(); + + writeInputMatrixWithMTD("X", X, true); + writeInputMatrixWithMTD("Y", Y, true); + + runTest(true, false, null, -1); + runRScript(true); + + HashMap<MatrixValue.CellIndex, Double> dmlfile = readDMLMatrixFromOutputDir("R"); + HashMap<MatrixValue.CellIndex, Double> rfile = readRMatrixFromExpectedDir("R"); + + + + ArrayList<Double> dml_values = new ArrayList<>(dmlfile.values()); + ArrayList<Double> r_values = new ArrayList<>(rfile.values()); + Collections.sort(dml_values); + Collections.sort(r_values); + + Assert.assertEquals(dml_values.size(), r_values.size()); + Assert.assertEquals(dml_values, r_values); + + + //Junit way collection equal ignore order. + //Assert.assertTrue(dml_values.size() == r_values.size() && dml_values.containsAll(r_values) && r_values.containsAll(dml_values)); + } + finally { + rtplatform = platformOld; + } + } + + +} diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/intersect.R similarity index 78% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/intersect.R index 0d6dfad..9d9fca9 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/intersect.R @@ -19,7 +19,11 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +args<-commandArgs(TRUE) +options(digits=22) +library("Matrix") + +X = as.matrix(readMM(paste(args[1], "X.mtx", sep=""))); +Y = as.matrix(readMM(paste(args[1], "Y.mtx", sep=""))); +R = intersect(X, Y); +writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep="")); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/intersect.dml similarity index 100% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/intersect.dml diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/setdiff.R similarity index 78% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/setdiff.R index 0d6dfad..d8111f3 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/setdiff.R @@ -19,7 +19,11 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +args<-commandArgs(TRUE) +options(digits=22) +library("Matrix") + +X = as.matrix(readMM(paste(args[1], "X.mtx", sep=""))); +Y = as.matrix(readMM(paste(args[1], "Y.mtx", sep=""))); +R = setdiff(X, Y); +writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep="")); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/setdiff.dml similarity index 92% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/setdiff.dml index 0d6dfad..1337e04 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/setdiff.dml @@ -19,7 +19,7 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +A = read($1); +B = read($2); +R = setdiff(X = A, Y = B); +write(R, $3); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/symmetricDifference.R similarity index 72% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/symmetricDifference.R index 0d6dfad..e1ed24c 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/symmetricDifference.R @@ -19,7 +19,14 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +args<-commandArgs(TRUE) +options(digits=22) +library("Matrix") + +X = as.matrix(readMM(paste(args[1], "X.mtx", sep=""))); +Y = as.matrix(readMM(paste(args[1], "Y.mtx", sep=""))); + +#both are possible +#R = setdiff(union(X,Y), intersect(X,Y)) +R = unique(c(setdiff(X,Y), setdiff(Y,X))); +writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep="")); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/symmetricDifference.dml similarity index 91% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/symmetricDifference.dml index 0d6dfad..b5168a5 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/symmetricDifference.dml @@ -19,7 +19,7 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +A = read($1); +B = read($2); +R = symmetricDifference(X = A, Y = B); +write(R, $3); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/union.R similarity index 75% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/union.R index 0d6dfad..2afaf8b 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/union.R @@ -19,7 +19,12 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +args<-commandArgs(TRUE) +options(digits=22) +library("Matrix") + +X = as.matrix(readMM(paste(args[1], "X.mtx", sep=""))); +Y = as.matrix(readMM(paste(args[1], "Y.mtx", sep=""))); +#R = union(X[order(X[,1]),], Y[order(Y[,1]),]); +R = union(X, Y); +writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep="")); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/union.dml similarity index 92% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/union.dml index 0d6dfad..f2eb291 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/union.dml @@ -19,7 +19,7 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +A = read($1); +B = read($2); +R = union(X = A, Y = B); +write(R, $3); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/unique.R similarity index 81% copy from src/test/scripts/functions/builtin/intersection.dml copy to src/test/scripts/functions/builtin/unique.R index 0d6dfad..6f4c178 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/unique.R @@ -18,8 +18,10 @@ # under the License. # #------------------------------------------------------------- +args<-commandArgs(TRUE) +options(digits=22) +library("Matrix") -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +X = as.matrix(readMM(paste(args[1], "X.mtx", sep=""))); +R = unique(X[order(X[,1]),]); +writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep="")); \ No newline at end of file diff --git a/src/test/scripts/functions/builtin/intersection.dml b/src/test/scripts/functions/builtin/unique.dml similarity index 92% rename from src/test/scripts/functions/builtin/intersection.dml rename to src/test/scripts/functions/builtin/unique.dml index 0d6dfad..55b5aab 100644 --- a/src/test/scripts/functions/builtin/intersection.dml +++ b/src/test/scripts/functions/builtin/unique.dml @@ -19,7 +19,6 @@ # #------------------------------------------------------------- -A = read($1) -B = read($2) -[set] = intersect(X = A, Y = B); -write(set, $3); +X = read($1); +R = unique(X = X); +write(R, $2);