Miguel Cabrera created SPARK-18180: -------------------------------------- Summary: pyspark.sql.Row does not serialize well to json Key: SPARK-18180 URL: https://issues.apache.org/jira/browse/SPARK-18180 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.1 Environment: HDP 2.3.4, Spark 2.0.1, Reporter: Miguel Cabrera
{{Row}} does not serialize well automatically. Although they are dict-like in Python, the json module does not see to be able to serialize it. {noformat} from pyspark.sql import Row import json r = Row(field1='hello', field2='world') json.dumps(r) {noformat} Results: {noformat} '["hello", "world"]' {noformat} Expected: {noformat} {'field1':'hellow', 'field2':'world'} {noformat} The work around is to call the {{asDict()}} method of Row. However, this makes custom serializing of nested objects really painful as the person has to be aware that is serializing a Row object. In particular with SPARK-17695, you cannot serialize DataFrames easily if you have some empty or null fields, so you have to customize the serialization process. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org