Baunsgaard commented on a change in pull request #1040: URL: https://github.com/apache/systemds/pull/1040#discussion_r498111776
########## File path: scripts/builtin/bivar.dml ########## @@ -0,0 +1,318 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +# For a given pair of attribute sets, compute bivariate statistics between all attribute pairs +# Given, index1 = {A_11, A_12, ... A_1m} and index2 = {A_21, A_22, ... A_2n} +# compute bivariate stats for m*n pairs (A_1i, A_2j), (1<= i <=m) and (1<= j <=n) +# +# Six inputs: +# 1) X - input data +# 2) S1 - First attribute set {A_11, A_12, ... A_1m} +# 3) S2 - Second attribute set {A_21, A_22, ... A_2n} +# 4) T1 - kind for attributes in S1 +# 5) T2 - kind for attributes in S2 (kind=1 for scale, kind=2 for nominal, kind=3 for ordinal) +# 6) verbose - print bivar stats Review comment: i think the description is fine for now. but we probably have to change it later, since it doesn't follow the same style as some of our other builtins. ########## File path: docs/index.md ########## @@ -28,7 +28,8 @@ SystemDS's distinguishing characteristics are: 2. **Multiple execution modes**, including Spark MLContext, Spark Batch, Standalone, and JMLC. 3. **Automatic optimization** based on data and cluster characteristics to ensure both efficiency and scalability. -This version of SystemDS supports: Java 8+, Python 3.5+, Hadoop 2.6+ (Not 3.X), and Spark 2.1+ (Not 3.X). +This version of SystemDS supports: Java 8+, Python 3.5+, Hadoop 2.6+ (Not 3.X), and Spark 2.1+ (Not 3.X) Nvidia CUDA 10.2 + (CuDNN 7.x) Intel MKL (<=2019.x). Review comment: This is not part of your commits, and is shown here because GitHub is misunderstanding it. I would suggest that you rebase your commits onto master and remove the commits not done by you, such that we don't see these line changes. ########## File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationUtils.java ########## @@ -139,19 +121,48 @@ public static DoubleObject aggMinMax(Future<FederatedResponse>[] ffr, boolean is } } - public static MatrixBlock rbind(Future<FederatedResponse>[] ffr) { + public static MatrixBlock bind(Future<FederatedResponse>[] ffr, boolean cbind) { // TODO handle non-contiguous cases try { MatrixBlock[] tmp = getResults(ffr); return tmp[0].append( - Arrays.copyOfRange(tmp, 1, tmp.length), - new MatrixBlock(), false); + Arrays.copyOfRange(tmp, 1, tmp.length), + new MatrixBlock(), cbind); } catch(Exception ex) { throw new DMLRuntimeException(ex); } } + public static MatrixBlock aggMinMax(Future<FederatedResponse>[] ffr, boolean isMin, boolean isScalar, Optional<Boolean> colFed) { + try { + if (! colFed.isPresent()) { + double res = isMin ? Double.MAX_VALUE : -Double.MAX_VALUE; + for (Future<FederatedResponse> fr : ffr) { + double v = isScalar ? ((ScalarObject) fr.get().getData()[0]).getDoubleValue() : + isMin ? ((MatrixBlock) fr.get().getData()[0]).min() : ((MatrixBlock) fr.get().getData()[0]).max(); + res = isMin ? Math.min(res, v) : Math.max(res, v); + } + return new MatrixBlock(1, 1, res); + } else { + MatrixBlock[] tmp = getResults(ffr); + int dim = colFed.get() ? tmp[0].getNumRows() : tmp[0].getNumColumns(); + + for (int i = 0; i < ffr.length - 1; i++) + for (int j = 0; j < dim; j++) + if (colFed.get()) + tmp[i + 1].setValue(j, i, isMin ? Double.min(tmp[i].getValue(j, i), tmp[i + 1].getValue(j, i)) : + Double.max(tmp[i].getValue(j, i), tmp[i + 1].getValue(j, i))); + else tmp[i + 1].setValue(i, j, isMin ? Double.min(tmp[i].getValue(i, j), tmp[i + 1].getValue(i, j)) : + Double.max(tmp[i].getValue(i, j), tmp[i + 1].getValue(i, j))); Review comment: this if else statement, is more confusing than it has to be. Please split some of the function calls out, since many of them actually are duplicate calls. Also calling Double.min and Double.max seems overkill when you can simply use < or > . Furthermore the way it is coded now overwrites each matrix blocks value, in sequence, rather than just reading. Usually reading and then writing to a variable is slower than simply reading(not 100% sure in this case). therefore simply only change a value when you have to, in this case when you encounter a different min or max. ########## File path: scripts/builtin/bivar.dml ########## @@ -0,0 +1,318 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +# For a given pair of attribute sets, compute bivariate statistics between all attribute pairs +# Given, index1 = {A_11, A_12, ... A_1m} and index2 = {A_21, A_22, ... A_2n} +# compute bivariate stats for m*n pairs (A_1i, A_2j), (1<= i <=m) and (1<= j <=n) +# +# Six inputs: +# 1) X - input data +# 2) S1 - First attribute set {A_11, A_12, ... A_1m} +# 3) S2 - Second attribute set {A_21, A_22, ... A_2n} +# 4) T1 - kind for attributes in S1 +# 5) T2 - kind for attributes in S2 (kind=1 for scale, kind=2 for nominal, kind=3 for ordinal) +# 6) verbose - print bivar stats + +m_bivar = function(Matrix[Double] X, Matrix[Double] S1, Matrix[Double] S2, Matrix[Double] T1, Matrix[Double] T2, Boolean verbose) + return (Matrix[Double] basestats_scale_scale, Matrix[Double] basestats_nominal_scale, Matrix[Double] basestats_nominal_nominal, Matrix[Double] basestats_ordinal_ordinal) +{ + s1size = ncol(S1); Review comment: also while you are at it, then fixing the indentation would be nice. ########## File path: src/test/scripts/functions/federated/FederatedMinMaxTestReference.dml ########## @@ -0,0 +1,27 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +if($5) { A = rbind(read($1), read($2)); } else { A = cbind(read($1), read($2)); } + +r = cbind(rowMins(A), rowMaxs(A)); +c = rbind(colMins(A), colMaxs(A)); +write(r, $3); +write(c, $4); Review comment: same here, this is two different tests. - cbind - rbind ########## File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationUtils.java ########## @@ -47,6 +41,9 @@ import org.apache.sysds.runtime.matrix.operators.ScalarOperator; import org.apache.sysds.runtime.matrix.operators.SimpleOperator; +import java.util.*; Review comment: your IDE probably did this. We in general don't do wildcard imports, so please revert this. ########## File path: src/test/java/org/apache/sysds/test/functions/federated/algorithms/FederatedBivarTest.java ########## @@ -37,38 +36,34 @@ @RunWith(value = Parameterized.class) @net.jcip.annotations.NotThreadSafe public class FederatedBivarTest extends AutomatedTestBase { - private final static String TEST_DIR = "functions/federated/"; - private final static String TEST_NAME = "FederatedBivarTest"; - private final static String TEST_CLASS_DIR = TEST_DIR + FederatedUnivarTest.class.getSimpleName() + "/"; - private final static int blocksize = 1024; - @Parameterized.Parameter() - public int rows; - @Parameterized.Parameter(1) - public int cols; + private final static String TEST_DIR = "functions/federated/"; + private final static String TEST_NAME = "FederatedBivarTest"; + private final static String TEST_CLASS_DIR = TEST_DIR + FederatedBivarTest.class.getSimpleName() + "/"; + private final static int blocksize = 1024; + @Parameterized.Parameter() + public int rows; + @Parameterized.Parameter(1) + public int cols; Review comment: indentation should be tabs. ########## File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationUtils.java ########## @@ -112,21 +109,6 @@ public static MatrixBlock aggMean(Future<FederatedResponse>[] ffr, FederationMap } } - public static DoubleObject aggMinMax(Future<FederatedResponse>[] ffr, boolean isMin, boolean isScalar) { Review comment: you moved this method slightly down inside the script. Doing this is confusing for GitHub, and should only be done if really needed. This change makes it harder to trace back how the function previously looked, and worked. (ps: you don't have to move it back.) ########## File path: src/test/java/org/apache/sysds/test/functions/federated/primitives/FederatedMinMaxTest.java ########## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.sysds.test.functions.federated.primitives; + +import org.apache.sysds.api.DMLScript; +import org.apache.sysds.common.Types; +import org.apache.sysds.runtime.meta.MatrixCharacteristics; +import org.apache.sysds.test.AutomatedTestBase; +import org.apache.sysds.test.TestConfiguration; +import org.apache.sysds.test.TestUtils; +import org.junit.Assert; +import org.junit.Ignore; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.Parameterized; + +import java.util.Arrays; +import java.util.Collection; + + +@RunWith(value = Parameterized.class) [email protected] +public class FederatedMinMaxTest extends AutomatedTestBase { + private final static String TEST_DIR = "functions/federated/"; + private final static String TEST_NAME = "FederatedMinMaxTest"; + private final static String TEST_CLASS_DIR = TEST_DIR + FederatedMinMaxTest.class.getSimpleName() + "/"; + + private final static int blocksize = 1024; + @Parameterized.Parameter() + public int rows; + @Parameterized.Parameter(1) + public int cols; + @Parameterized.Parameter(2) + public boolean rowPartitioned; + + @Parameterized.Parameters + public static Collection<Object[]> data() { + return Arrays.asList(new Object[][] { + {1000, 10, false}, {10, 1000, false}, + {1000, 10, true}, {10, 1000, true}, {1000, 1, true} + }); + } + + @Override + public void setUp() { + TestUtils.clearAssertionInformation(); + addTestConfiguration(TEST_NAME, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME, new String[] {"S", "R", "C"})); + } + + @Test +// @Ignore Review comment: please remove this when done ########## File path: src/test/scripts/functions/federated/FederatedMinMaxTest.dml ########## @@ -0,0 +1,47 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +if ($rP) { + A = federated(addresses=list($in, $in), ranges=list(list(0, 0), list($rows / 2, $cols), list($rows / 2, 0), list($rows, $cols))); +} else { + A = federated(addresses=list($in, $in), ranges=list(list(0, 0), list($rows, $cols / 2), list(0, $cols / 2), list($rows, $cols))); +} +s = matrix(0, 1, 2); +s[1, 1] = min(A); +s[1, 2] = max(A); + +/* +r = cbind(rowMins(A), rowMaxs(A)); +c = rbind(colMins(A), colMaxs(A)); +*/ + +r = matrix(0, nrow(A), 2); +c = matrix(0, 2, ncol(A)); + +r[, 1] = rowMins(A); +r[, 2] = rowMaxs(A); + +c[1, ] = colMins(A); +c[2, ] = colMaxs(A); + +write(s, $out_S); +write(r, $out_R); +write(c, $out_C); Review comment: one write per test. split this into 4 different scripts. - rowMins - rowMaxs - colMins - colMaxs then if they are not made: - max - min ########## File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationUtils.java ########## @@ -179,22 +190,32 @@ public static ScalarObject aggScalar(AggregateUnaryOperator aop, Future<Federate } public static MatrixBlock aggMatrix(AggregateUnaryOperator aop, Future<FederatedResponse>[] ffr, FederationMap map) { - // handle row aggregate - if( aop.isRowAggregate() ) { - //independent of aggregation function for row-partitioned federated matrices - return rbind(ffr); + + if (aop.aggOp.increOp.fn instanceof Builtin && + (((Builtin) aop.aggOp.increOp.fn).getBuiltinCode() == BuiltinCode.MIN || + ((Builtin) aop.aggOp.increOp.fn).getBuiltinCode() == BuiltinCode.MAX)) { + boolean isMin = ((Builtin) aop.aggOp.increOp.fn).getBuiltinCode() == BuiltinCode.MIN; + + if(aop.isRowAggregate()) + if (map.getType() == FederationMap.FType.COL) + return aggMinMax(ffr,isMin,false, Optional.of(true)); + else return bind(ffr, false); + + + if(aop.isColAggregate()) + if (map.getType() == FederationMap.FType.ROW) + return aggMinMax(ffr, isMin,false, Optional.of(false)); + else return bind(ffr, true); + + return aggMinMax(ffr, isMin, false, Optional.empty()); } - // handle col aggregate + + if( aop.isRowAggregate()) return bind(ffr, false); + //whole matrix if( aop.aggOp.increOp.fn instanceof KahanFunction ) return aggAdd(ffr); else if( aop.aggOp.increOp.fn instanceof Mean ) return aggMean(ffr, map); Review comment: here is where you have to choose how to make the aggMean colwise or rowwise just like the min and max functions. (for your other work.) ########## File path: src/main/java/org/apache/sysds/runtime/controlprogram/federated/FederationUtils.java ########## @@ -179,22 +190,32 @@ public static ScalarObject aggScalar(AggregateUnaryOperator aop, Future<Federate } public static MatrixBlock aggMatrix(AggregateUnaryOperator aop, Future<FederatedResponse>[] ffr, FederationMap map) { - // handle row aggregate - if( aop.isRowAggregate() ) { - //independent of aggregation function for row-partitioned federated matrices - return rbind(ffr); + + if (aop.aggOp.increOp.fn instanceof Builtin && Review comment: i don't like this move. the check in the else if on line 192 is the same. and now it is moved up slightly. please move the content of the if into the else if on line 192 in the previous code. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
