PySpark doesn't attempt to support Jython at present. IMO while it might be a 
bit faster, it would lose a lot of the benefits of Python, which are the very 
strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure 
it's worth supporting unless someone demonstrates a really major performance 
benefit.

There was actually a recent patch to add PyPy support 
(https://github.com/apache/spark/pull/2144), which is worth a try if you want 
Python applications to run faster. It might actually be faster overall than 
Jython.

Matei

On Oct 5, 2014, at 10:16 AM, Robert C Senkbeil <rcsen...@us.ibm.com> wrote:

> 
> 
> Hi there,
> 
> I wanted to ask whether or not anyone has successfully used Jython with the
> pyspark library. I wasn't sure if the C extension support was needed for
> pyspark itself or was just a bonus of using Cython.
> 
> There was a claim (
> http://apache-spark-developers-list.1001551.n3.nabble.com/PySpark-Driver-from-Jython-td7142.html#a7269
> ) that using Jython would be better - if you didn't need C extension
> support - because the cost of serialization is lower. However, I have not
> been able to import pyspark into a Jython session. I'm using version 2.7b3
> of Jython and version 1.1.0 of Spark for reference.
> 
> Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51)
> [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from pyspark import SparkContext, SparkConf
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "pyspark/__init__.py", line 63, in <module>
>  File "pyspark/context.py", line 25, in <module>
>  File "pyspark/accumulators.py", line 94, in <module>
>  File "pyspark/serializers.py", line 341, in <module>
>  File "pyspark/serializers.py", line 328, in _hijack_namedtuple
> RuntimeError: maximum recursion depth exceeded (Java StackOverflowError)
> 
> Is there something I am missing with this? Did Jython ever work for
> pyspark? The same error happens regardless of whether I use the Python
> files or compile them down to Java class files using Jython first.
> 
> I know that previous documentation (0.9.1) indicated, "PySpark requires
> Python 2.6 or higher. PySpark applications are executed using a standard
> CPython interpreter in order to support Python modules that use C
> extensions. We have not tested PySpark with Python 3 or with alternative
> Python interpreters, such as PyPy or Jython."
> 
> In later versions, it now reflects, "Spark 1.1.0 works with Python 2.6 or
> higher (but not Python 3). It uses the standard CPython interpreter, so C
> libraries like NumPy can be used."
> 
> I'm assuming this means that attempts to use other interpreters failed. If
> so, are there any plans to support something like Jython in the future?
> 
> Signed,
> Chip Senkbeil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to