[
https://issues.apache.org/jira/browse/FLINK-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952951#comment-15952951
]
Lev Konstantinovskiy commented on FLINK-2094:
---------------------------------------------
Apologies, that feature in Gensim has not been named correctly. It's not really
about online, but about vocabulary-expansion. It also has not been evaluated
throughly yet. There has been no research on how good are the vectors for the
new words seen 10 times compared to words seen 1000 times in initial training.
Even without the vocabulary expansion, word2vec is dependent on the order in
which it sees documents due to the learning rate scaling. So having it learn
"truly online", without knowing the size of the dataset, would be interesting
new territory.
> Implement Word2Vec
> ------------------
>
> Key: FLINK-2094
> URL: https://issues.apache.org/jira/browse/FLINK-2094
> Project: Flink
> Issue Type: Improvement
> Components: Machine Learning Library
> Reporter: Nikolaas Steenbergen
> Assignee: Nikolaas Steenbergen
> Priority: Minor
> Labels: ML
>
> implement Word2Vec
> http://arxiv.org/pdf/1402.3722v1.pdf
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)