[
https://issues.apache.org/jira/browse/SPARK-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Derrick Burns closed SPARK-3253.
--------------------------------
Resolution: Invalid
> KMeans cluster will fail on large number of clusters/high dimensional data
> --------------------------------------------------------------------------
>
> Key: SPARK-3253
> URL: https://issues.apache.org/jira/browse/SPARK-3253
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.0.2
> Reporter: Derrick Burns
>
> The latest changes to use broadcast to communicate cluster centers to workers
> keeps closure size small, but does not avoid the problem of returning the
> cluster centers to the master in the final collect() stage. At this step, the
> collect() may fail because the resulting cluster centers are larger than the
> akka framesize can accommodate. What is frustrating about this is that there
> is no indication that the failure was caused by the frame size being
> exceeded. This makes this a Major issue, even though there is a simple
> workaround, i.e. increasing the frame size.
> What would be helpful is a check BEFORE the clusterer begins the heavy
> lifting. The check would compute the expected result size and compare it to
> the akka frame size. If the result won't fit, at the very least it emits a
> warning.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]