[
https://issues.apache.org/jira/browse/SPARK-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liquan Pei updated SPARK-2510:
------------------------------
Description: We would like to add parallel implementation of word2vec to
MLlib. word2vec finds distributed representation of words through training of
large data sets. We will focus on skip-gram model and hierarchical softmax in
our initial implementation. (was: We would like to add parallel
implementation of word2vec to MLlib. word2vec finds distributed representation
of words through training of large data sets. The Spark programming model fits
nicely with word2vec as the training algorithm of word2vec is embarrassingly
parallel. We will focus on skip-gram model and negative sampling in our initial
implementation. )
> word2vec: Distributed Representation of Words
> ---------------------------------------------
>
> Key: SPARK-2510
> URL: https://issues.apache.org/jira/browse/SPARK-2510
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Liquan Pei
> Assignee: Liquan Pei
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> We would like to add parallel implementation of word2vec to MLlib. word2vec
> finds distributed representation of words through training of large data
> sets. We will focus on skip-gram model and hierarchical softmax in our
> initial implementation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)