[ https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia updated SPARK-2494: --------------------------------- Target Version/s: 1.1.0, 1.0.2, 0.9.3 (was: 1.1.0, 1.0.2) > Hash of None is different cross machines in CPython > --------------------------------------------------- > > Key: SPARK-2494 > URL: https://issues.apache.org/jira/browse/SPARK-2494 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 0.9.0, 0.9.1, 0.9.2, 1.0.0, 1.0.1 > Environment: CPython 2.x > Reporter: Davies Liu > Labels: pyspark, shuffle > Fix For: 1.1.0, 1.0.2, 0.9.3 > > Original Estimate: 24h > Remaining Estimate: 24h > > The hash of None, also tuple with None in it, is different cross machines, so > the result will be wrong if None appear in the key of partitionBy(). > It should use an portable hash function as the default partition function, > which generate same hash for all the builtin immutable types, especially > tuple. -- This message was sent by Atlassian JIRA (v6.2#6252)