[
https://issues.apache.org/jira/browse/SPARK-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043533#comment-15043533
]
Apache Spark commented on SPARK-12153:
--------------------------------------
User 'ygcao' has created a pull request for this issue:
https://github.com/apache/spark/pull/10152
> Word2Vec uses a fixed length for sentences which is not reasonable for
> reality, and similarity functions and fields are not accessible
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-12153
> URL: https://issues.apache.org/jira/browse/SPARK-12153
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.5.2
> Reporter: YongGang Cao
> Priority: Minor
> Labels: patch
>
> sentence boundary matters for sliding window, we shouldn't train model from a
> window across sentences. the current 100 word as a hard split for sentences
> doesn't really make sense.
> And the cosinesimilarity functions is private which is useless for caller.
> we may need to access the vocabulary and wordindex table as well, those need
> getters
> I made changes to address above issues.
> here is the pull request: https://github.com/apache/spark/pull/10152
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]