What about the allocation of a new breeze vector? Can it happen unsafe within Spark (in several threads)?
Best regards, Alexander 03.09.2014, в 23:17, "Xiangrui Meng" <men...@gmail.com> написал(а): > RJ, could you provide a code example that can re-produce the bug you > observed in local testing? Breeze's += is not thread-safe. But in a > Spark job, calls to a resultHandler is synchronized: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala#L52 > . Let's move our discussion to the JIRA page. -Xiangrui > > On Wed, Sep 3, 2014 at 12:07 PM, RJ Nowling <rnowl...@gmail.com> wrote: >> Here's the JIRA: >> >> https://issues.apache.org/jira/browse/SPARK-3384 >> >> Even if the current implementation uses += in a thread safe manner, it can >> be easy to make the mistake of accidentally using += in a parallelized >> context. I suggest changing all instances of += to +. >> >> I would encourage others to reproduce and validate this issue, though. >> >> >> On Wed, Sep 3, 2014 at 3:02 PM, David Hall <d...@cs.berkeley.edu> wrote: >> >>> mutating operations are not thread safe. Operations that don't mutate >>> should be thread safe. I can't speak to what Evan said, but I would guess >>> that the way they're using += should be safe. >>> >>> >>> On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling <rnowl...@gmail.com> wrote: >>> >>>> David, >>>> >>>> Can you confirm that += is not thread safe but + is? I'm assuming + >>>> allocates a new object for the write, while += doesn't. >>>> >>>> Thanks! >>>> RJ >>>> >>>> >>>> On Wed, Sep 3, 2014 at 2:50 PM, David Hall <d...@cs.berkeley.edu> wrote: >>>> >>>>> In general, in Breeze we allocate separate work arrays for each call to >>>>> lapack, so it should be fine. In general concurrent modification isn't >>>>> thread safe of course, but things that "ought" to be thread safe really >>>>> should be. >>>>> >>>>> >>>>> On Wed, Sep 3, 2014 at 10:41 AM, RJ Nowling <rnowl...@gmail.com> wrote: >>>>> >>>>>> No, it's not in all cases. Since Breeze uses lapack under the hood, >>>>>> changes to memory between different threads is bad. >>>>>> >>>>>> There's actually a potential bug in the KMeans code where it uses += >>>>>> instead of +. >>>>>> >>>>>> >>>>>> On Wed, Sep 3, 2014 at 1:26 PM, Ulanov, Alexander < >>>>>> alexander.ula...@hp.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Is breeze library called thread safe from Spark mllib code in case >>>>>> when >>>>>>> native libs for blas and lapack are used? Might it be an issue when >>>>>> running >>>>>>> Spark locally? >>>>>>> >>>>>>> Best regards, Alexander >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>> >>>>>> >>>>>> -- >>>>>> em rnowl...@gmail.com >>>>>> c 954.496.2314 >>>> >>>> >>>> -- >>>> em rnowl...@gmail.com >>>> c 954.496.2314 >> >> >> -- >> em rnowl...@gmail.com >> c 954.496.2314 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org