[jira] [Updated] (SPARK-1468) The hash method used by partitionBy in Pyspark doesn't deal with None correctly.

Erik Selin (JIRA) Thu, 10 Apr 2014 16:02:02 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Erik Selin updated SPARK-1468:
------------------------------

    Description: In python the default hash method uses the memory address of 
objects. Since None is an object None will get partitioned into different 
partitions depending on which python process it is run in. This causes some 
really odd results when None key's are used in the partitionBy.

> The hash method used by partitionBy in Pyspark doesn't deal with None 
> correctly.
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-1468
>                 URL: https://issues.apache.org/jira/browse/SPARK-1468
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.0
>            Reporter: Erik Selin
>
> In python the default hash method uses the memory address of objects. Since 
> None is an object None will get partitioned into different partitions 
> depending on which python process it is run in. This causes some really odd 
> results when None key's are used in the partitionBy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-1468) The hash method used by partitionBy in Pyspark doesn't deal with None correctly.

Reply via email to