Liquan Pei created SPARK-2510:
---------------------------------
Summary: word2vec: Distributed Representation of Words
Key: SPARK-2510
URL: https://issues.apache.org/jira/browse/SPARK-2510
Project: Spark
Issue Type: New Feature
Components: MLlib
Reporter: Liquan Pei
We would like to add parallel implementation of word2vec to MLlib. word2vec
finds distributed representation of words through training of large data sets.
The Spark programming model fits nicely with word2vec as the training algorithm
of word2vec is embarrassingly parallel. We will focus on skip-gram model and
negative sampling in our initial implementation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)