Hi, I tested this one and it works for me. Why is the JIRA bug still open? Eran
On Mon, Aug 10, 2015 at 7:02 PM IT CTO <[email protected]> wrote: > Greate, I did not know. I will test it tomorrow. > Eran > > בתאריך יום ב׳, 10 באוג׳ 2015, 18:48 מאת Felix Cheung < > [email protected]>: > >> Could you elaborate? Are you referring to working around this issue?The >> fix for this has been merged. >> >> > From: [email protected] >> > Date: Mon, 10 Aug 2015 11:48:13 +0000 >> > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on >> DataFrame in pyspark >> > To: [email protected] >> > >> > Does anyone knows how to solve this one? my users are using python and >> > iterating through the DF each time is not useful >> > Eran >> > >> > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <[email protected]> >> > wrote: >> > >> > > Felix Cheung created ZEPPELIN-185: >> > > ------------------------------------- >> > > >> > > Summary: z.show does not work on DataFrame in pyspark >> > > Key: ZEPPELIN-185 >> > > URL: >> https://issues.apache.org/jira/browse/ZEPPELIN-185 >> > > Project: Zeppelin >> > > Issue Type: Bug >> > > Components: Core, Interpreters >> > > Affects Versions: 0.6.0 >> > > Reporter: Felix Cheung >> > > Assignee: Felix Cheung >> > > >> > > >> > > I’ve tested this out and found these issues. Firstly, >> > > >> > > >> > > >> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame >> > > # Code should be changed to this – it does not work in pyspark CLI >> > > otherwise >> > > rdd = sc.parallelize(["1","2","3"]) >> > > Data = Row('first') >> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) >> > > >> > > Secondly, >> > > z.show() doesn’t seem to work properly in Python – I see the same >> error >> > > below: “AttributeError: 'DataFrame' object has no attribute >> > > '_get_object_id'" >> > > #Python/PySpark – doesn’t work >> > > rdd = sc.parallelize(["1","2","3"]) >> > > Data = Row('first') >> > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) >> > > print df >> > > print df.collect() >> > > z.show(df) >> > > AttributeError: 'DataFrame' object has no attribute >> > > ‘_get_object_id' >> > > >> > > #Scala – this works >> > > val a = sc.parallelize(List("1", "2", "3")) >> > > val df = a.toDF() >> > > z.show(df) >> > > >> > > >> > > >> > > -- >> > > This message was sent by Atlassian JIRA >> > > (v6.3.4#6332) >> > > >> > >
