Davies Liu created SPARK-2494:
---------------------------------
Summary: Hash of None is different cross machines in CPython
Key: SPARK-2494
URL: https://issues.apache.org/jira/browse/SPARK-2494
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.0.0, 1.0.1
Environment: CPython 2.x
Reporter: Davies Liu
Priority: Blocker
Fix For: 1.0.1, 1.0.0
The hash of None, also tuple with None in it, is different cross machines, so
the result will be wrong if None appear in the key of partitionBy().
It should use an portable hash function as the default partition function,
which generate same hash for all the builtin immutable types, especially tuple.
--
This message was sent by Atlassian JIRA
(v6.2#6252)