GitHub user felixcheung opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/176

    [ZEPPELIN-185] ZeppelinContext methods like z.show are not working with 
DataFrame in pyspark

    z.show() doesn’t seem to work properly in Python – I see the same error 
below: “AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
    #Python/PySpark – doesn’t work
    rdd = sc.parallelize(["1","2","3"])
    Data = Row('first')
    df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
    print df
    print df.collect()
    z.show(df)
    AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'
    
    More generally, ZeppelinContext methods are not working with Python objects 
since Py4J would need to know how to serialize it
    
    It turns out the error is caused by Py4J trying to auto convert the 
DataFrame, which fails since it can only do that for simple types.
    Instead of getting conversion to work, the better approach is to pass along 
the inner java object instead. To do that we intercept the call on the python 
side with a wrapper object instead of letting Py4J handle it

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/felixcheung/incubator-zeppelin master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #176
    
----
commit 7a30a14fb2b3c0e41017d909ba5c1e1a0b9c545b
Author: Felix Cheung <[email protected]>
Date:   2015-03-25T18:15:53Z

    minor doc update for running on YARN

commit 65ba046bc87cf3146ae0f80a336fb6c05d4b6619
Author: Felix Cheung <[email protected]>
Date:   2015-03-28T16:56:10Z

    Merge commit 'a007a9b5f235ebd9c608a005c5243503291d94d5'

commit e89ba083c21847a0b99c07735f106561ceee122b
Author: Felix Cheung <[email protected]>
Date:   2015-03-31T20:24:16Z

    PySpark Interpreter should allow starting with a specific version of 
Python, as PySpark does.

commit dfbb458abf6aa1c61b26bd51d40083a4d9664b53
Author: Felix Cheung <[email protected]>
Date:   2015-05-08T17:19:13Z

    Merge commit 'e23f3034053fbd8b8f4eff478c372d151a42c36b'

commit a55666d20e1d01cab8eeb5d0ba85f9255cb69f2c
Author: Felix Cheung <[email protected]>
Date:   2015-05-10T04:52:42Z

    PySpark error handling improvement - return more meaningful message from 
the original error which is useful for Spark related errors

commit c4754974f0b48a098c10b4ef2094152529eb097d
Author: Felix Cheung <[email protected]>
Date:   2015-05-10T05:01:01Z

    Merge commit '956e3f74a1b2f28fd8caa25055e77f687ca8d211'
    
    Conflicts:
        spark/src/main/resources/python/zeppelin_pyspark.py

commit 1f6b8d87d1878e4a3b961d329fea2ca8e57235ed
Author: Felix Cheung <[email protected]>
Date:   2015-06-24T05:32:04Z

    Merge commit '7599efb54e2941f097473db0b0b3c79b9af98099'

commit 6965a1c211ebe858f82ed488bdc754830bd96264
Author: Felix Cheung <[email protected]>
Date:   2015-06-24T08:34:41Z

    Build - reorder build steps and make names consistent

commit e579f34fc578114ddeba9a17d44fcb2362cf6d51
Author: Felix Cheung <[email protected]>
Date:   2015-07-22T02:24:42Z

    Merge commit '6a011b99328996fb99731e8222c77a5886a280a4'
    
    Conflicts:
        flink/pom.xml
        ignite/pom.xml
        spark/pom.xml

commit a5c545971c5fd68ee86f3e987d4874d3bac6bf33
Author: Felix Cheung <[email protected]>
Date:   2015-07-29T05:29:00Z

    Merge commit '5b89ba98f8ea52992cdd2e1a39ba03bb7168b311'

commit 82f6dcc9f6d8644d9c989cebd003e9c8b07de56d
Author: Felix Cheung <[email protected]>
Date:   2015-08-01T08:50:37Z

    [ZEPPELIN-185] ZeppelinContext methods like z.show are not working with 
DataFrame in pyspark
    It turns out the error is caused by Py4J trying to auto convert the 
DataFrame, which fails since it can only do that for simple types.
    Instead of getting conversion to work, the better approach is to pass along 
the inner java object instead. To do that we intercept the call on the python 
side with a wrapper object instead of letting Py4J handle it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to