[ https://issues.apache.org/jira/browse/SPARK-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065059#comment-14065059 ]
Matthew Farrellee commented on SPARK-1662: ------------------------------------------ [~nrchandan] and [~pwendell] - i recommend you close this as not a bug. it's not pyspark's fault that the user-defined class is not able to be pickled. you can change the Point class in the example to make it pickleable and the example program will work. see https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled original gist for posterity - {code} import pyspark class Point(object): '''this class being used as container''' pass def to_point_obj(point_as_dict): '''convert a dict representation of a point to Point object''' p = Point() p.x = point_as_dict['x'] p.y = point_as_dict['y'] return p def add_two_points(point_obj1, point_obj2): print type(point_obj1), type(point_obj2) point_obj1.x += point_obj2.x point_obj1.y += point_obj2.y return point_obj1 def zero_point(): p = Point() p.x = p.y = 0 return p sc = pyspark.SparkContext('local', 'test_app') a = sc.parallelize([{'x':1, 'y':1}, {'x':2, 'y':2}, {'x':3, 'y':3}]) b = a.map(to_point_obj) # convert to an RDD of Point objects c = b.fold(zero_point(), add_two_points) {code} > PySpark fails if python class is used as a data container > --------------------------------------------------------- > > Key: SPARK-1662 > URL: https://issues.apache.org/jira/browse/SPARK-1662 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.0.0 > Environment: Ubuntu 14, Python 2.7.6 > Reporter: Chandan Kumar > Priority: Minor > > PySpark fails if RDD operations are performed on data encapsulated in Python > objects (rare use case where plain python objects are used as data containers > instead of regular dict or tuples). > I have written a small piece of code to reproduce the bug: > https://gist.github.com/nrchandan/11394440 > <script src="https://gist.github.com/nrchandan/11394440.js"></script> -- This message was sent by Atlassian JIRA (v6.2#6252)