Xusen Yin created SPARK-12834:
---------------------------------

             Summary: Use type conversion instead of Ser/De of Pickle to 
transform JavaArray and JavaList
                 Key: SPARK-12834
                 URL: https://issues.apache.org/jira/browse/SPARK-12834
             Project: Spark
          Issue Type: Improvement
            Reporter: Xusen Yin


According to the Ser/De code in Python side:

{code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false}
  def _java2py(sc, r, encoding="bytes"):
    if isinstance(r, JavaObject):
        clsName = r.getClass().getSimpleName()
        # convert RDD into JavaRDD
        if clsName != 'JavaRDD' and clsName.endswith("RDD"):
            r = r.toJavaRDD()
            clsName = 'JavaRDD'

        if clsName == 'JavaRDD':
            jrdd = sc._jvm.SerDe.javaToPython(r)
            return RDD(jrdd, sc)

        if clsName == 'DataFrame':
            return DataFrame(r, SQLContext.getOrCreate(sc))

        if clsName in _picklable_classes:
            r = sc._jvm.SerDe.dumps(r)
        elif isinstance(r, (JavaArray, JavaList)):
            try:
                r = sc._jvm.SerDe.dumps(r)
            except Py4JJavaError:
                pass  # not pickable

    if isinstance(r, (bytearray, bytes)):
        r = PickleSerializer().loads(bytes(r), encoding=encoding)
    return r
{code}

We use SerDe.sumps to serialize JavaArray and JavaList in PythonMLLibAPI, then 
deserialize them with PickleSerializer in Python side. However, there is no 
need to transform them in such an inefficient way. Instead of it, we can use 
type conversion to convert them, e.g. list(JavaArray) or list(JavaList). What's 
more, there is an issue to Ser/De Scala Array as I said in 
https://issues.apache.org/jira/browse/SPARK-12780



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to