[ 
https://issues.apache.org/jira/browse/SPARK-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

flykobe cheng closed SPARK-7892.
--------------------------------
    Resolution: Duplicate

> Python class in __main__ may trigger AssertionError
> ---------------------------------------------------
>
>                 Key: SPARK-7892
>                 URL: https://issues.apache.org/jira/browse/SPARK-7892
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.2.0
>         Environment: Linux, Python 2.7.3
> pickled by Python pickle Lib
>            Reporter: flykobe cheng
>            Priority: Minor
>
> Callback functions for spark transformations and actions will be pickled. 
> If the callback is instancemethod of __main__ module's class, and the class 
> has more than one instancemethod which using class properties or 
> classmethods, the class will be pickled twice, and 'pickle.memoize' twice, 
> then trigger AssertionError.
> Demo code:
> class AClass(object):
>     _class_var = {'classkey': 'classval', } 
>     def main_object_method(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, 
> AClass._class_var['classkey']))
>     def main_object_method2(self, item):
>         logging.warn("class var by %s: %s" % (sys._getframe().f_code.co_name, 
> AClass._class_var['classkey']))
>         
> def test_main_object_method(sc):
>     obj = AClass()
>     res = sc.parallelize(range(4)).map(obj.main_object_method).collect()
> if __name__ == '__main__':
>     cf = pyspark.SparkConf()
>     cf.set('spark.cores.max', 1)
>     sc = pyspark.SparkContext(appName = "flykobe_demo_pickle_error", conf = 
> cf)
>     test_main_object_method(sc)
> Traceback:
>   File 
> "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
>  line 310, in save_function_tuple
>     save(f_globals)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in 
> save
>     f(self, obj) # Call unbound method with explicit self
>   File 
> "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
>  line 174, in save_dict
>     pickle.Pickler.save_dict(self, obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 654, in 
> save_dict
>     self._batch_setitems(obj.iteritems())
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 686, in 
> _batch_setitems
>     save(v)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 291, in 
> save
>     f(self, obj) # Call unbound method with explicit self
>   File 
> "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
>  line 468, in save_global
>     d),obj=obj)
>   File 
> "/home/users/chengyi02/svn-root/app/ecom/darwin/local/spark-1.2.0.5-client/python/pyspark/cloudpickle.py",
>  line 638, in save_reduce
>     self.memoize(obj)
>   File "/home/users/chengyi02/.jumbo/lib/python2.7/pickle.py", line 248, in 
> memoize
>     assert id(obj) not in self.memo 
> AssertionError
> Problem in Python/Lib/pickle.py:
>     def memoize(self, obj):
>         """Store an object in the memo."""
>         if self.fast:
>             return
>         assert id(obj) not in self.memo
>         memo_len = len(self.memo)
>         self.write(self.put(memo_len))
>         self.memo[id(obj)] = memo_len, obj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to