Thanks for helping with the Dill integration; I had some early first attempts, 
but had to set them aside when I got busy with some other work.

Just to bring everyone up to speed regarding context:
There are some objects that PySpark’s `cloudpickle` library doesn’t serialize 
properly, such as operator.getattr 
(https://issues.apache.org/jira/browse/SPARK-791) or NamedTuples 
(https://issues.apache.org/jira/browse/SPARK-1687).
My early attempt at replacing CloudPickle with Dill ran into problems because 
of slight differences in how Dill pickles functions defined in doctests versus 
functions defined elsewhere.  I opened a bug report for this with the Dill 
developers (https://github.com/uqfoundation/dill/issues/18), who subsequently 
fixed the bug (https://github.com/uqfoundation/dill/pull/29).
It looks like there’s already a couple of Dill issues with examples of the 
“Can’t pickle _ it’s not found as _” bug 
(https://github.com/uqfoundation/dill/search?q=%22not+found+as%22&type=Issues). 
 If you can find a small test case that reproduces this bug, I’d consider 
opening a new Dill issue.

- Josh
On June 19, 2014 at 7:48:13 AM, Mark Baker (dist...@acm.org) wrote:

Hi. As part of my attempt to port Pyspark to Python 3, I've  
re-applied, with modifications, Josh's old commit for using Dill with  
Pyspark (as Dill already supports Python 3). Alas, I ran into an odd  
problem that I could use some help with.  

Josh's old commit;  

https://github.com/JoshRosen/incubator-spark/commit/2ac8986f3009f0dc133b11d16887fc8ddb33c3d1
  

My Dill branch;  

https://github.com/distobj/spark/tree/dill  

(Note; I've been running this in a virtualenv into which I  
pip-installed dill. I haven't yet figured out the new way to package  
it in python/lib as was done for py4j)  

So the problem is that run_tests is failing with this pickle.py error  
on most of the tests (those using .cache() it seems, unsurprisingly);  

PicklingError: Can't pickle <type '_sre.SRE_Pattern'>: it's not  
found as _sre.SRE_Pattern  

What's odd is that the same doctests work fine when run from the shell.  

TIA for any ideas...  

Reply via email to