yunfengzhou-hub commented on a change in pull request #70:
URL: https://github.com/apache/flink-ml/pull/70#discussion_r835994436
##########
File path:
flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/KMeansTest.java
##########
@@ -177,11 +177,20 @@ public void testFewerDistinctPointsThanCluster() {
KMeans kmeans = new KMeans().setK(2);
KMeansModel model = kmeans.fit(input);
Table output = model.transform(input)[0];
- List<Set<DenseVector>> expectedGroups =
-
Collections.singletonList(Collections.singleton(Vectors.dense(0.0, 0.1)));
- List<Set<DenseVector>> actualGroups =
- executeAndCollect(output, kmeans.getFeaturesCol(),
kmeans.getPredictionCol());
- assertTrue(CollectionUtils.isEqualCollection(expectedGroups,
actualGroups));
+
+ try {
Review comment:
I think the current definition of K and the expected behavior is
contradict. If we want to keep the same behavior, I think we should adopt one
of the followings:
- Change `K`'s description from `The number of clusters to create` to `The
max number of clusters to create`
- If there are fewer distinct points than clusters, the training process
would still create K clusters, but some of them are identical.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]