[
https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539865#comment-16539865
]
ZhongYu commented on SPARK-24666:
---------------------------------
It is very easy to reproduce. We have about 600,000 words and 10,000,000
dataset. Just set numIterations = 20 and you will find it.
> Word2Vec generate infinity vectors when numIterations are large
> ---------------------------------------------------------------
>
> Key: SPARK-24666
> URL: https://issues.apache.org/jira/browse/SPARK-24666
> Project: Spark
> Issue Type: Bug
> Components: ML, MLlib
> Affects Versions: 2.3.1
> Environment: 2.0.X, 2.1.X, 2.2.X, 2.3.X
> Reporter: ZhongYu
> Priority: Major
>
> We found that Word2Vec generate large absolute value vectors when
> numIterations are large, and if numIterations are large enough (>20), the
> vector's value many be *infinity(or -**infinity)***, resulting in useless
> vectors.
> In normal situations, vectors values are mainly around -1.0~1.0 when
> numIterations = 1.
> The bug is shown on spark 2.0.X, 2.1.X, 2.2.X, 2.3.X.
> There are already issues report this bug:
> https://issues.apache.org/jira/browse/SPARK-5261 , but the bug fix works
> seems missing.
> Other people's reports:
> [https://stackoverflow.com/questions/49741956/infinity-vectors-in-spark-mllib-word2vec]
> [http://apache-spark-user-list.1001560.n3.nabble.com/word2vec-outputs-Infinity-Infinity-vectors-with-increasing-iterations-td29020.html]
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]