[ https://issues.apache.org/jira/browse/SPARK-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Adetiloye updated SPARK-20470: ------------------------------------- Description: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: {code} { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } {code} Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- start: double (nullable = true) | | |-- width: double (nullable = true) | | |-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase {code} rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict()))) {code} Output JSON (Wrong) {code} { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } {code} was: Trying to convert an RDD in pyspark containing Array of struct doesn't generate the right json. It looks trivial but can't get a good json out. I read the json below into a dataframe: {code} { "feature": "feature_id_001", "histogram": [ { "start": 1.9796095151877942, "y": 968.0, "width": 0.1564485056196041 }, { "start": 2.1360580208073983, "y": 892.0, "width": 0.1564485056196041 }, { "start": 2.2925065264270024, "y": 814.0, "width": 0.15644850561960366 }, { "start": 2.448955032046606, "y": 690.0, "width": 0.1564485056196041 }] } {code} Df schema looks good {code} root |-- feature: string (nullable = true) |-- histogram: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- start: double (nullable = true) | | |-- width: double (nullable = true) | | |-- y: double (nullable = true) {code} Need to convert each row to json now and save to HBase rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict()))) Output JSON (Wrong) {code} { "feature": "feature_id_001", "histogram": [ [ 1.9796095151877942, 968.0, 0.1564485056196041 ], [ 2.1360580208073983, 892.0, 0.1564485056196041 ], [ 2.2925065264270024, 814.0, 0.15644850561960366 ], [ 2.448955032046606, 690.0, 0.1564485056196041 ] } {code} > Invalid json converting RDD row with Array of struct to json > ------------------------------------------------------------ > > Key: SPARK-20470 > URL: https://issues.apache.org/jira/browse/SPARK-20470 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.6.3 > Reporter: Philip Adetiloye > > Trying to convert an RDD in pyspark containing Array of struct doesn't > generate the right json. It looks trivial but can't get a good json out. > I read the json below into a dataframe: > {code} > { > "feature": "feature_id_001", > "histogram": [ > { > "start": 1.9796095151877942, > "y": 968.0, > "width": 0.1564485056196041 > }, > { > "start": 2.1360580208073983, > "y": 892.0, > "width": 0.1564485056196041 > }, > { > "start": 2.2925065264270024, > "y": 814.0, > "width": 0.15644850561960366 > }, > { > "start": 2.448955032046606, > "y": 690.0, > "width": 0.1564485056196041 > }] > } > {code} > Df schema looks good > {code} > root > |-- feature: string (nullable = true) > |-- histogram: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- start: double (nullable = true) > | | |-- width: double (nullable = true) > | | |-- y: double (nullable = true) > {code} > Need to convert each row to json now and save to HBase > {code} > rdd1 = rdd.map(lambda row: Row(x = json.dumps(row.asDict()))) > {code} > Output JSON (Wrong) > {code} > { > "feature": "feature_id_001", > "histogram": [ > [ > 1.9796095151877942, > 968.0, > 0.1564485056196041 > ], > [ > 2.1360580208073983, > 892.0, > 0.1564485056196041 > ], > [ > 2.2925065264270024, > 814.0, > 0.15644850561960366 > ], > [ > 2.448955032046606, > 690.0, > 0.1564485056196041 > ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org