[ 
https://issues.apache.org/jira/browse/SPARK-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4561:
------------------------------
    Description: 
In PySpark, you can call {{.asDict
()}} on a SparkSQL {{Row}} to convert it to a dictionary.  Unfortunately, 
though, this does not convert nested rows to dictionaries.  For example:

{code}
>>> sqlContext.sql("select results from results").first()
Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), 
Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), 
Row(time=3.239), Row(time=3.149)])
>>> sqlContext.sql("select results from results").first().asDict()
{u'results': [(3.762,),
  (3.47,),
  (3.559,),
  (3.458,),
  (3.229,),
  (3.21,),
  (3.166,),
  (3.276,),
  (3.239,),
  (3.149,)]}
{code}


Actually, it looks like the nested fields are just left as Rows (IPython's 
fancy display logic obscured this in my first example):

{code}
>>> Row(results=[Row(time=1), Row(time=2)]).asDict()
{'results': [Row(time=1), Row(time=2)]}
{code}

Here's the output I'd expect:

{code}
>>> Row(results=[Row(time=1), Row(time=2)])
{'results' : [{'time': 1}, {'time': 2}]}
{code}

I ran into this issue when trying to use Pandas dataframes to display nested 
data that I queried from Spark SQL.

  was:
In PySpark, you can call {{.asDict
()}} on a SparkSQL {{Row}} to convert it to a dictionary.  Unfortunately, 
though, this does not convert nested rows to dictionaries.  For example:

{code}
>>> sqlContext.sql("select results from results").first()
Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), 
Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), 
Row(time=3.239), Row(time=3.149)])
>>> sqlContext.sql("select results from results").first().asDict()
{u'results': [(3.762,),
  (3.47,),
  (3.559,),
  (3.458,),
  (3.229,),
  (3.21,),
  (3.166,),
  (3.276,),
  (3.239,),
  (3.149,)]}
{code}


Actually, it looks like the nested fields are just left as Rows (IPython's 
fancy display logic obscured this in my first example):

{code}
>>> Row(results=[Row(time=1), Row(time=2)]).asDict()
{'results': [Row(time=1), Row(time=2)]}
{code}

I ran into this issue when trying to use Pandas dataframes to display nested 
data that I queried from Spark SQL.


> PySparkSQL's Row.asDict() should convert nested rows to dictionaries
> --------------------------------------------------------------------
>
>                 Key: SPARK-4561
>                 URL: https://issues.apache.org/jira/browse/SPARK-4561
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 1.2.0
>            Reporter: Josh Rosen
>
> In PySpark, you can call {{.asDict
> ()}} on a SparkSQL {{Row}} to convert it to a dictionary.  Unfortunately, 
> though, this does not convert nested rows to dictionaries.  For example:
> {code}
> >>> sqlContext.sql("select results from results").first()
> Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), 
> Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), 
> Row(time=3.276), Row(time=3.239), Row(time=3.149)])
> >>> sqlContext.sql("select results from results").first().asDict()
> {u'results': [(3.762,),
>   (3.47,),
>   (3.559,),
>   (3.458,),
>   (3.229,),
>   (3.21,),
>   (3.166,),
>   (3.276,),
>   (3.239,),
>   (3.149,)]}
> {code}
> Actually, it looks like the nested fields are just left as Rows (IPython's 
> fancy display logic obscured this in my first example):
> {code}
> >>> Row(results=[Row(time=1), Row(time=2)]).asDict()
> {'results': [Row(time=1), Row(time=2)]}
> {code}
> Here's the output I'd expect:
> {code}
> >>> Row(results=[Row(time=1), Row(time=2)])
> {'results' : [{'time': 1}, {'time': 2}]}
> {code}
> I ran into this issue when trying to use Pandas dataframes to display nested 
> data that I queried from Spark SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to