[ 
https://issues.apache.org/jira/browse/SPARK-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120432#comment-14120432
 ] 

Evan Sparks commented on SPARK-3384:
------------------------------------

I agree with Sean. Avoiding the costly penalty of object allocation overhead is 
important to avoid here. As far as I can tell, we are using reduceByKey in the 
prescribed way (see: 
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3cecd3c09a-50f3-4683-a639-daddc4101...@gmail.com%3E)
 mutating the left input. I don't believe that spark needs this mutation to be 
thread-safe, because it executes the combine sequentially on all workers, and 
then reduces sequentially on the master, but I could be wrong.

> Potential thread unsafe Breeze vector addition in KMeans
> --------------------------------------------------------
>
>                 Key: SPARK-3384
>                 URL: https://issues.apache.org/jira/browse/SPARK-3384
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: RJ Nowling
>
> In the KMeans clustering implementation, the Breeze vectors are accumulated 
> using +=.  For example,
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L162
>  This is potentially a thread unsafe operation.  (This is what I observed in 
> local testing.)  I suggest changing the += to + -- a new object will be 
> allocated but it will be thread safe since it won't write to an old location 
> accessed by multiple threads.
> Further testing is required to reproduce and verify.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to