zhipeng93 commented on a change in pull request #70:
URL: https://github.com/apache/flink-ml/pull/70#discussion_r836037690
##########
File path:
flink-ml-lib/src/test/java/org/apache/flink/ml/clustering/KMeansTest.java
##########
@@ -177,11 +177,20 @@ public void testFewerDistinctPointsThanCluster() {
KMeans kmeans = new KMeans().setK(2);
KMeansModel model = kmeans.fit(input);
Table output = model.transform(input)[0];
- List<Set<DenseVector>> expectedGroups =
-
Collections.singletonList(Collections.singleton(Vectors.dense(0.0, 0.1)));
- List<Set<DenseVector>> actualGroups =
- executeAndCollect(output, kmeans.getFeaturesCol(),
kmeans.getPredictionCol());
- assertTrue(CollectionUtils.isEqualCollection(expectedGroups,
actualGroups));
+
+ try {
Review comment:
I agree with the definition of `The max number of clusters to
create...`.
If there are fewer distinct points than clusters, I would suggest not to
create `k` centers by duplicating some data points for the following two
reasons:
- Existing libraries like Spark ML/Alink are not doing this.
- There is no known use case for making it `k` centers with some identical
cluster centers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]