Thanks for helping with the Dill integration; I had some early first attempts, but had to set them aside when I got busy with some other work.
Just to bring everyone up to speed regarding context: There are some objects that PySpark’s `cloudpickle` library doesn’t serialize properly, such as operator.getattr (https://issues.apache.org/jira/browse/SPARK-791) or NamedTuples (https://issues.apache.org/jira/browse/SPARK-1687). My early attempt at replacing CloudPickle with Dill ran into problems because of slight differences in how Dill pickles functions defined in doctests versus functions defined elsewhere. I opened a bug report for this with the Dill developers (https://github.com/uqfoundation/dill/issues/18), who subsequently fixed the bug (https://github.com/uqfoundation/dill/pull/29). It looks like there’s already a couple of Dill issues with examples of the “Can’t pickle _ it’s not found as _” bug (https://github.com/uqfoundation/dill/search?q=%22not+found+as%22&type=Issues). If you can find a small test case that reproduces this bug, I’d consider opening a new Dill issue. - Josh On June 19, 2014 at 7:48:13 AM, Mark Baker (dist...@acm.org) wrote: Hi. As part of my attempt to port Pyspark to Python 3, I've re-applied, with modifications, Josh's old commit for using Dill with Pyspark (as Dill already supports Python 3). Alas, I ran into an odd problem that I could use some help with. Josh's old commit; https://github.com/JoshRosen/incubator-spark/commit/2ac8986f3009f0dc133b11d16887fc8ddb33c3d1 My Dill branch; https://github.com/distobj/spark/tree/dill (Note; I've been running this in a virtualenv into which I pip-installed dill. I haven't yet figured out the new way to package it in python/lib as was done for py4j) So the problem is that run_tests is failing with this pickle.py error on most of the tests (those using .cache() it seems, unsurprisingly); PicklingError: Can't pickle <type '_sre.SRE_Pattern'>: it's not found as _sre.SRE_Pattern What's odd is that the same doctests work fine when run from the shell. TIA for any ideas...