[
https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Adetiloye updated SPARK-20470:
-------------------------------------
Description:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate
the right json. It looks trivial but can't get a good json out.
I read the json below into a dataframe:
{code}
{
"feature": "feature_id_001",
"histogram": [
{
"start": 1.9796095151877942,
"y": 968.0,
"width": 0.1564485056196041
},
{
"start": 2.1360580208073983,
"y": 892.0,
"width": 0.1564485056196041
},
{
"start": 2.2925065264270024,
"y": 814.0,
"width": 0.15644850561960366
},
{
"start": 2.448955032046606,
"y": 690.0,
"width": 0.1564485056196041
}]
}
{code}
Df schema looks good
{code}
root
|-- feature: string (nullable = true)
|-- histogram: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- start: double (nullable = true)
| | |-- width: double (nullable = true)
| | |-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict())))
Output JSON (Wrong)
{code}
{
"feature": "feature_id_001",
"histogram": [
[
1.9796095151877942,
968.0,
0.1564485056196041
],
[
2.1360580208073983,
892.0,
0.1564485056196041
],
[
2.2925065264270024,
814.0,
0.15644850561960366
],
[
2.448955032046606,
690.0,
0.1564485056196041
]
}
{code}
was:
Trying to convert an RDD in pyspark containing Array of struct doesn't generate
the right json. It looks trivial but can't get a good json out.
I read the json below into a dataframe:
{
"feature": "feature_id_001",
"histogram": [
{
"start": 1.9796095151877942,
"y": 968.0,
"width": 0.1564485056196041
},
{
"start": 2.1360580208073983,
"y": 892.0,
"width": 0.1564485056196041
},
{
"start": 2.2925065264270024,
"y": 814.0,
"width": 0.15644850561960366
},
{
"start": 2.448955032046606,
"y": 690.0,
"width": 0.1564485056196041
}]
}
Df schema looks good
{code}
root
|-- feature: string (nullable = true)
|-- histogram: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- start: double (nullable = true)
| | |-- width: double (nullable = true)
| | |-- y: double (nullable = true)
{code}
Need to convert each row to json now and save to HBase
rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict())))
Output JSON (Wrong)
{
"feature": "feature_id_001",
"histogram": [
[
1.9796095151877942,
968.0,
0.1564485056196041
],
[
2.1360580208073983,
892.0,
0.1564485056196041
],
[
2.2925065264270024,
814.0,
0.15644850561960366
],
[
2.448955032046606,
690.0,
0.1564485056196041
]
}
> Invalid json converting RDD row with Array of struct to json
> ------------------------------------------------------------
>
> Key: SPARK-20470
> URL: https://issues.apache.org/jira/browse/SPARK-20470
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.6.3
> Reporter: Philip Adetiloye
>
> Trying to convert an RDD in pyspark containing Array of struct doesn't
> generate the right json. It looks trivial but can't get a good json out.
> I read the json below into a dataframe:
> {code}
> {
> "feature": "feature_id_001",
> "histogram": [
> {
> "start": 1.9796095151877942,
> "y": 968.0,
> "width": 0.1564485056196041
> },
> {
> "start": 2.1360580208073983,
> "y": 892.0,
> "width": 0.1564485056196041
> },
> {
> "start": 2.2925065264270024,
> "y": 814.0,
> "width": 0.15644850561960366
> },
> {
> "start": 2.448955032046606,
> "y": 690.0,
> "width": 0.1564485056196041
> }]
> }
> {code}
> Df schema looks good
> {code}
> root
> |-- feature: string (nullable = true)
> |-- histogram: array (nullable = true)
> | |-- element: struct (containsNull = true)
> | | |-- start: double (nullable = true)
> | | |-- width: double (nullable = true)
> | | |-- y: double (nullable = true)
> {code}
> Need to convert each row to json now and save to HBase
> rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict())))
> Output JSON (Wrong)
> {code}
> {
> "feature": "feature_id_001",
> "histogram": [
> [
> 1.9796095151877942,
> 968.0,
> 0.1564485056196041
> ],
> [
> 2.1360580208073983,
> 892.0,
> 0.1564485056196041
> ],
> [
> 2.2925065264270024,
> 814.0,
> 0.15644850561960366
> ],
> [
> 2.448955032046606,
> 690.0,
> 0.1564485056196041
> ]
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]