[
https://issues.apache.org/jira/browse/SPARK-20902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-20902:
---------------------------------
Labels: ML bulk-closed (was: ML)
> Word2Vec implementations with Negative Sampling
> -----------------------------------------------
>
> Key: SPARK-20902
> URL: https://issues.apache.org/jira/browse/SPARK-20902
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Affects Versions: 2.1.1
> Reporter: Shubham Chopra
> Priority: Major
> Labels: ML, bulk-closed
>
> Spark MLlib Word2Vec currently only implements Skip-Gram+Hierarchical
> softmax. Both Continuous bag of words (CBOW) and SkipGram have shown
> comparative or better performance with Negative Sampling. This umbrella JIRA
> is to keep a track of the effort to add negative sampling based
> implementations of both CBOW and SkipGram models to Spark MLlib.
> Since word2vec is largely a pre-processing step, the performance often can
> depend on the application it is being used for, and the corpus it is
> estimated on. These implementation give users the choice of picking one that
> works best for their use-case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]