[ 
https://issues.apache.org/jira/browse/SPARK-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liquan Pei updated SPARK-2510:
------------------------------

    Description: We would like to add parallel implementation of word2vec to 
MLlib. word2vec finds distributed representation of words through training of 
large data sets. We will focus on skip-gram model and hierarchical softmax in 
our initial implementation.   (was: We would like to add parallel 
implementation of word2vec to MLlib. word2vec finds distributed representation 
of words through training of large data sets. The Spark programming model fits 
nicely with word2vec as the training algorithm of word2vec is embarrassingly 
parallel. We will focus on skip-gram model and negative sampling in our initial 
implementation. )

> word2vec: Distributed Representation of Words
> ---------------------------------------------
>
>                 Key: SPARK-2510
>                 URL: https://issues.apache.org/jira/browse/SPARK-2510
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Liquan Pei
>            Assignee: Liquan Pei
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We would like to add parallel implementation of word2vec to MLlib. word2vec 
> finds distributed representation of words through training of large data 
> sets. We will focus on skip-gram model and hierarchical softmax in our 
> initial implementation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to