Does anyone knows how to solve this one? my users are using python and iterating through the DF each time is not useful Eran
On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <[email protected]> wrote: > Felix Cheung created ZEPPELIN-185: > ------------------------------------- > > Summary: z.show does not work on DataFrame in pyspark > Key: ZEPPELIN-185 > URL: https://issues.apache.org/jira/browse/ZEPPELIN-185 > Project: Zeppelin > Issue Type: Bug > Components: Core, Interpreters > Affects Versions: 0.6.0 > Reporter: Felix Cheung > Assignee: Felix Cheung > > > I’ve tested this out and found these issues. Firstly, > > > http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame > # Code should be changed to this – it does not work in pyspark CLI > otherwise > rdd = sc.parallelize(["1","2","3"]) > Data = Row('first') > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > > Secondly, > z.show() doesn’t seem to work properly in Python – I see the same error > below: “AttributeError: 'DataFrame' object has no attribute > '_get_object_id'" > #Python/PySpark – doesn’t work > rdd = sc.parallelize(["1","2","3"]) > Data = Row('first') > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > print df > print df.collect() > z.show(df) > AttributeError: 'DataFrame' object has no attribute > ‘_get_object_id' > > #Scala – this works > val a = sc.parallelize(List("1", "2", "3")) > val df = a.toDF() > z.show(df) > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
