[
https://issues.apache.org/jira/browse/SPARK-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698190#comment-14698190
]
Yanbo Liang commented on SPARK-9793:
------------------------------------
[~josephkb] I have combined this with SPARK-9940 and merged the two PRs.
The new PR makes PySpark Vector semantic equality and hash uses first 16
entries like what Scala does.
It can fix the issues that [~mengxr]'s list at SPARK-9750
* Python
** DenseVector: Semantic eq but only with `DenseVector`. Default hash. -> bug
** SparseVector: Semantic eq but wrong (only with `SparseVector` and not
handling explicit zeros). Default hash. -> bug
> PySpark DenseVector, SparseVector should override __eq__
> --------------------------------------------------------
>
> Key: SPARK-9793
> URL: https://issues.apache.org/jira/browse/SPARK-9793
> Project: Spark
> Issue Type: Bug
> Components: ML, PySpark
> Affects Versions: 1.5.0
> Reporter: Joseph K. Bradley
> Priority: Critical
>
> See [SPARK-9750].
> PySpark DenseVector and SparseVector do not override the equality operator
> properly. They should use semantics, not representation, for comparison.
> (This is what Scala currently does.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]