[ 
https://issues.apache.org/jira/browse/SPARK-20902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20902:
---------------------------------
    Labels: ML bulk-closed  (was: ML)

> Word2Vec implementations with Negative Sampling
> -----------------------------------------------
>
>                 Key: SPARK-20902
>                 URL: https://issues.apache.org/jira/browse/SPARK-20902
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.1.1
>            Reporter: Shubham Chopra
>            Priority: Major
>              Labels: ML, bulk-closed
>
> Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical 
> softmax. Both Continuous bag of words (CBOW) and SkipGram have shown 
> comparative or better performance with Negative Sampling. This umbrella JIRA 
> is to keep a track of the effort to add negative sampling based 
> implementations of both CBOW and SkipGram models to Spark MLlib.
> Since word2vec is largely a pre-processing step, the performance often can 
> depend on the application it is being used for, and the corpus it is 
> estimated on. These implementation give users the choice of picking one that 
> works best for their use-case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to