[ 
https://issues.apache.org/jira/browse/SPARK-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065059#comment-14065059
 ] 

Matthew Farrellee commented on SPARK-1662:
------------------------------------------

[~nrchandan] and [~pwendell] - i recommend you close this as not a bug. it's 
not pyspark's fault that the user-defined class is not able to be pickled. you 
can change the Point class in the example to make it pickleable and the example 
program will work. see 
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled

original gist for posterity -
{code}
import pyspark

class Point(object):
    '''this class being used as container'''
    pass

def to_point_obj(point_as_dict):
    '''convert a dict representation of a point to Point object'''
    p = Point()
    p.x = point_as_dict['x']
    p.y = point_as_dict['y']
    return p

def add_two_points(point_obj1, point_obj2):
    print type(point_obj1), type(point_obj2)
    point_obj1.x += point_obj2.x
    point_obj1.y += point_obj2.y
    return point_obj1

def zero_point():
    p = Point()
    p.x = p.y = 0
    return p

sc = pyspark.SparkContext('local', 'test_app')

a = sc.parallelize([{'x':1, 'y':1}, {'x':2, 'y':2}, {'x':3, 'y':3}])
b = a.map(to_point_obj)   # convert to an RDD of Point objects
c = b.fold(zero_point(), add_two_points)
{code}

> PySpark fails if python class is used as a data container
> ---------------------------------------------------------
>
>                 Key: SPARK-1662
>                 URL: https://issues.apache.org/jira/browse/SPARK-1662
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.0.0
>         Environment: Ubuntu 14, Python 2.7.6
>            Reporter: Chandan Kumar
>            Priority: Minor
>
> PySpark fails if RDD operations are performed on data encapsulated in Python 
> objects (rare use case where plain python objects are used as data containers 
> instead of regular dict or tuples).
> I have written a small piece of code to reproduce the bug:
> https://gist.github.com/nrchandan/11394440
> <script src="https://gist.github.com/nrchandan/11394440.js";></script>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to