Re: ml.feature.Word2Vec.transform() very slow issue

Nick Pentreath Mon, 09 Nov 2015 23:41:42 -0800

Seems a straightforward change that purely enhances efficiency, so yes
please submit a JIRA and PR for this


On Tue, Nov 10, 2015 at 8:56 AM, Sean Owen <[email protected]> wrote:

> Since it's a fairly expensive operation to build the Map, I tend to agree
> it should not happen in the loop.
>
> On Tue, Nov 10, 2015 at 5:08 AM, Yuming Wang <[email protected]> wrote:
>
>> Hi
>>
>>
>>
>> I found org.apache.spark.ml.feature.Word2Vec.transform() very slow.
>>
>> I think we should not read broadcast every sentence, so I fixed on my forked.
>>
>>
>>
>> https://github.com/979969786/spark/commit/a9f894df3671bb8df2f342de1820dab3185598f3
>>
>>
>>
>> I have use 20000 number rows test it. Original version consume *5 minutes*,
>>
>>
>> 
>>
>> and my version just consume *22 seconds* on same data.
>>
>>
>> 
>>
>>
>>
>>
>> If I'm right, I will pull request.
>>
>>
>>
>> Thanks
>>
>>
>

Re: ml.feature.Word2Vec.transform() very slow issue

Reply via email to