PySpark doesn't attempt to support Jython at present. IMO while it might be a bit faster, it would lose a lot of the benefits of Python, which are the very strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure it's worth supporting unless someone demonstrates a really major performance benefit.
There was actually a recent patch to add PyPy support (https://github.com/apache/spark/pull/2144), which is worth a try if you want Python applications to run faster. It might actually be faster overall than Jython. Matei On Oct 5, 2014, at 10:16 AM, Robert C Senkbeil <rcsen...@us.ibm.com> wrote: > > > Hi there, > > I wanted to ask whether or not anyone has successfully used Jython with the > pyspark library. I wasn't sure if the C extension support was needed for > pyspark itself or was just a bonus of using Cython. > > There was a claim ( > http://apache-spark-developers-list.1001551.n3.nabble.com/PySpark-Driver-from-Jython-td7142.html#a7269 > ) that using Jython would be better - if you didn't need C extension > support - because the cost of serialization is lower. However, I have not > been able to import pyspark into a Jython session. I'm using version 2.7b3 > of Jython and version 1.1.0 of Spark for reference. > > Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51) > [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51 > Type "help", "copyright", "credits" or "license" for more information. >>>> from pyspark import SparkContext, SparkConf > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "pyspark/__init__.py", line 63, in <module> > File "pyspark/context.py", line 25, in <module> > File "pyspark/accumulators.py", line 94, in <module> > File "pyspark/serializers.py", line 341, in <module> > File "pyspark/serializers.py", line 328, in _hijack_namedtuple > RuntimeError: maximum recursion depth exceeded (Java StackOverflowError) > > Is there something I am missing with this? Did Jython ever work for > pyspark? The same error happens regardless of whether I use the Python > files or compile them down to Java class files using Jython first. > > I know that previous documentation (0.9.1) indicated, "PySpark requires > Python 2.6 or higher. PySpark applications are executed using a standard > CPython interpreter in order to support Python modules that use C > extensions. We have not tested PySpark with Python 3 or with alternative > Python interpreters, such as PyPy or Jython." > > In later versions, it now reflects, "Spark 1.1.0 works with Python 2.6 or > higher (but not Python 3). It uses the standard CPython interpreter, so C > libraries like NumPy can be used." > > I'm assuming this means that attempts to use other interpreters failed. If > so, are there any plans to support something like Jython in the future? > > Signed, > Chip Senkbeil --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org