Seems a straightforward change that purely enhances efficiency, so yes please submit a JIRA and PR for this
On Tue, Nov 10, 2015 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote: > Since it's a fairly expensive operation to build the Map, I tend to agree > it should not happen in the loop. > > On Tue, Nov 10, 2015 at 5:08 AM, Yuming Wang <q79969...@gmail.com> wrote: > >> Hi >> >> >> >> I found org.apache.spark.ml.feature.Word2Vec.transform() very slow. >> >> I think we should not read broadcast every sentence, so I fixed on my forked. >> >> >> >> https://github.com/979969786/spark/commit/a9f894df3671bb8df2f342de1820dab3185598f3 >> >> >> >> I have use 20000 number rows test it. Original version consume *5 minutes*, >> >> >> >> >> and my version just consume *22 seconds* on same data. >> >> >> >> >> >> >> >> If I'm right, I will pull request. >> >> >> >> Thanks >> >> >