[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

DjvuLee (JIRA) Tue, 15 Jul 2014 00:47:29 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061804#comment-14061804
 ]


DjvuLee commented on SPARK-2138:
--------------------------------

[~piotrszul] In my opinion, if your task size if bigger than the 
akka.frameSize, it should work failed, because the akka.frameSize is the 
threshold. 

What I confused is that the serialized task size should not become bigger and 
bigger. In any opinion, it seem that the KMeans can work succeeded in the 
v1.0.1 version. In the v0.9.0 version, even you increase the akka.frameSize 
parameter, your kmeans job can not be done.

> The KMeans algorithm in the MLlib can lead to the Serialized Task size become 
> bigger and bigger
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2138
>                 URL: https://issues.apache.org/jira/browse/SPARK-2138
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: DjvuLee
>            Assignee: Xiangrui Meng
>
> When the algorithm running at certain stage, when running the reduceBykey() 
> function, It can lead to Executor Lost and Task lost, after several times. 
> the application exit.
> When this error occurred, the size of serialized task is bigger than 10MB, 
> and the size become larger as the iteration increase.
> the data generation file: https://gist.github.com/djvulee/7e3b2c9eb33ff0037622
> the running code: https://gist.github.com/djvulee/6bf00e60885215e3bfd5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2138) The KMeans algorithm in the MLlib can lead to the Serialized Task size become bigger and bigger

Reply via email to