[ 
https://issues.apache.org/jira/browse/SPARK-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Derrick Burns closed SPARK-3253.
--------------------------------

    Resolution: Invalid

> KMeans cluster will fail on large number of clusters/high dimensional data
> --------------------------------------------------------------------------
>
>                 Key: SPARK-3253
>                 URL: https://issues.apache.org/jira/browse/SPARK-3253
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.0.2
>            Reporter: Derrick Burns
>
> The latest changes to use broadcast to communicate cluster centers to workers 
> keeps closure size small, but does not avoid the problem of returning the 
> cluster centers to the master in the final collect() stage. At this step, the 
> collect() may fail because the resulting cluster centers are larger than the 
> akka framesize can accommodate.  What is frustrating about this is that there 
> is no indication that the failure was caused by the frame size being 
> exceeded.  This makes this a Major issue, even though there is a simple 
> workaround, i.e. increasing the frame size. 
> What would be helpful is a check BEFORE the clusterer begins the heavy 
> lifting.  The check would compute the expected result size and compare it to 
> the akka frame size.  If the result won't fit, at the very least it emits a 
> warning.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to