Liquan Pei created SPARK-2510:
---------------------------------

             Summary: word2vec: Distributed Representation of Words
                 Key: SPARK-2510
                 URL: https://issues.apache.org/jira/browse/SPARK-2510
             Project: Spark
          Issue Type: New Feature
          Components: MLlib
            Reporter: Liquan Pei


We would like to add parallel implementation of word2vec to MLlib. word2vec 
finds distributed representation of words through training of large data sets. 
The Spark programming model fits nicely with word2vec as the training algorithm 
of word2vec is embarrassingly parallel. We will focus on skip-gram model and 
negative sampling in our initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to