lindong28 commented on a change in pull request #24:
URL: https://github.com/apache/flink-ml/pull/24#discussion_r751889202



##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/KnnParams.java
##########
@@ -0,0 +1,38 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.ml.param.IntParam;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.ParamValidators;
+import org.apache.flink.ml.param.WithParams;
+import org.apache.flink.ml.params.shared.colname.HasFeatureColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasKnnDistanceType;
+import org.apache.flink.ml.params.shared.colname.HasLabelCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionDetailCol;
+import org.apache.flink.ml.params.shared.colname.HasReservedColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasVectorColDefaultAsNull;
+
+/** knn fit parameters. */
+public interface KnnParams<T>

Review comment:
       Does `Knn` and `KnnModel` share exactly the same set of parameters? If 
not, we have separate parameter classes for them, e.g. `KnnParams` and 
`KnnModelParams`?

##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/KnnParams.java
##########
@@ -0,0 +1,38 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.ml.param.IntParam;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.ParamValidators;
+import org.apache.flink.ml.param.WithParams;
+import org.apache.flink.ml.params.shared.colname.HasFeatureColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasKnnDistanceType;
+import org.apache.flink.ml.params.shared.colname.HasLabelCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionDetailCol;
+import org.apache.flink.ml.params.shared.colname.HasReservedColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasVectorColDefaultAsNull;
+
+/** knn fit parameters. */
+public interface KnnParams<T>
+        extends WithParams<T>,
+                HasVectorColDefaultAsNull<T>,
+                HasKnnDistanceType<T>,
+                HasLabelCol<T>,
+                HasFeatureColsDefaultAsNull<T>,
+                HasPredictionCol<T>,
+                HasPredictionDetailCol<T>,

Review comment:
       Spark does not support `HasPredictionDetailCol`. Any chance we can try 
to be consistent with Spark user experience for now? Otherwise, could you help 
explain the use-case and explain how this use-case is supported for Spark users?

##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/distance/EuclideanDistance.java
##########
@@ -0,0 +1,259 @@
+package org.apache.flink.ml.classification.knn.distance;
+
+import org.apache.flink.ml.common.linalg.BLAS;
+import org.apache.flink.ml.common.linalg.DenseMatrix;
+import org.apache.flink.ml.common.linalg.DenseVector;
+import org.apache.flink.ml.common.linalg.MatVecOp;
+import org.apache.flink.ml.common.linalg.SparseVector;
+import org.apache.flink.ml.common.linalg.Vector;
+import org.apache.flink.util.Preconditions;
+
+import java.util.Arrays;
+
+/**
+ * Euclidean distance is the "ordinary" straight-line distance between two 
points in Euclidean
+ * space.
+ *
+ * <p>https://en.wikipedia.org/wiki/Euclidean_distance
+ *
+ * <p>Given two vectors a and b, Euclidean Distance = ||a - b||, where ||*|| 
means the L2 norm of
+ * the vector.
+ */
+public class EuclideanDistance extends BaseFastDistance {

Review comment:
       Can we re-use `EuclideanDistanceMeasure` for now? Otherwise, we will 
need to discuss the design for Flink ML infra classes such as 
`FastDistanceVectorData`, which will take sometime I guess.
   
   Same for other Flink ML infra classes.

##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/linalg/SparseVector.java
##########
@@ -0,0 +1,574 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common.linalg;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.TreeMap;
+
+/** A sparse vector represented by an indices array and a values array. */
+public class SparseVector extends Vector {

Review comment:
       If we decide to add `SparseVector`, it probably needs to have a 
`SparseVectorTypeInfoFactory` similar to the existing 
`DenseVectorTypeInfoFactory`, to achieve best possible performance for 
serialization and deserialization.
   
   And we will also need to discuss the APIs of this class and make sure it is 
consistent with the DenseVector.
   
   Can we start simple and only add minimum APIs needed by Knn in this PR?

##########
File path: pom.xml
##########
@@ -56,6 +56,7 @@ under the License.
     <module>flink-ml-iteration</module>
     <module>flink-ml-lib</module>
     <module>flink-ml-tests</module>
+    <module>flink-ml-lib</module>

Review comment:
       It looks like this line is not needed anymore.
   
   Could you manually do `mvn clean spotless:apply package` and make sure this 
PR could pass all checks?
   

##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/classification/knn/KnnParams.java
##########
@@ -0,0 +1,38 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.ml.param.IntParam;
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.ml.param.ParamValidators;
+import org.apache.flink.ml.param.WithParams;
+import org.apache.flink.ml.params.shared.colname.HasFeatureColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasKnnDistanceType;
+import org.apache.flink.ml.params.shared.colname.HasLabelCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionCol;
+import org.apache.flink.ml.params.shared.colname.HasPredictionDetailCol;
+import org.apache.flink.ml.params.shared.colname.HasReservedColsDefaultAsNull;
+import org.apache.flink.ml.params.shared.colname.HasVectorColDefaultAsNull;
+
+/** knn fit parameters. */
+public interface KnnParams<T>
+        extends WithParams<T>,
+                HasVectorColDefaultAsNull<T>,
+                HasKnnDistanceType<T>,

Review comment:
       Can we re-use `HasDistanceMeasure` here? Same for `HasPredictionCol` and 
`HasFeaturesCol `.

##########
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/classification/knn/KnnTest.java
##########
@@ -0,0 +1,168 @@
+package org.apache.flink.ml.classification.knn;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.common.typeinfo.Types;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.ml.api.core.Pipeline;
+import org.apache.flink.ml.api.core.Stage;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import 
org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.functions.sink.SinkFunction;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.types.Row;
+
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+/** knn algorithm test. */
+public class KnnTest {

Review comment:
       Could these unit tests verify the numerical correctness of the outputs, 
instead of simply doing System.out?
   
   In general we don't do System.out in unit tests in Flink. The existing 
`KMeansTest` and the `NaiveBayes` PR is able to test numerical correctness. And 
I think Spark also has numerical correctness test for most algorithms.
   
   
   

##########
File path: 
flink-ml-lib/src/main/java/org/apache/flink/ml/common/BatchOperator.java
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.common;
+
+import org.apache.flink.ml.param.Param;
+import org.apache.flink.table.api.Table;
+
+import java.util.Arrays;
+import java.util.Map;
+
+/**
+ * Base class of offline learning algorithm operators.
+ *
+ * <p>This class extends {@link BaseAlgoImpl} to support data transmission 
between BatchOperators.
+ */
+public abstract class BatchOperator<T extends BatchOperator<T>> extends 
BaseAlgoImpl<T> {

Review comment:
       Hmm.. I thought the plan is to postphone the addition and the discussion 
of `BatchOperator` and `BaseAlgoImpl`. Could we start simple for now and remove 
them from this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to