[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #70: [FLINK-26313] Add Transformer and Estimator of OnlineKMeans

GitBox Sun, 27 Mar 2022 18:09:40 -0700


yunfengzhou-hub commented on a change in pull request #70:
URL: https://github.com/apache/flink-ml/pull/70#discussion_r835994436




##########
File path: 
flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/KMeansTest.java
##########
@@ -177,11 +177,20 @@ public void testFewerDistinctPointsThanCluster() {
         KMeans kmeans = new KMeans().setK(2);
         KMeansModel model = kmeans.fit(input);
         Table output = model.transform(input)[0];
-        List<Set<DenseVector>> expectedGroups =
-                
Collections.singletonList(Collections.singleton(Vectors.dense(0.0, 0.1)));
-        List<Set<DenseVector>> actualGroups =
-                executeAndCollect(output, kmeans.getFeaturesCol(), 
kmeans.getPredictionCol());
-        assertTrue(CollectionUtils.isEqualCollection(expectedGroups, 
actualGroups));
+
+        try {

Review comment:
       I think the current definition of K and the expected behavior is 
contradict. If we want to keep the same behavior, I think we should adopt one 
of the followings:
   
   - Change `K`'s description from `The number of clusters to create` to `The 
max number of clusters to create`
   - If there are fewer distinct points than clusters, the training process 
would still create K clusters, but some of them are identical.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-ml] yunfengzhou-hub commented on a change in pull request #70: [FLINK-26313] Add Transformer and Estimator of OnlineKMeans

Reply via email to