[
https://issues.apache.org/jira/browse/SPARK-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-12436:
------------------------------
Target Version/s: (was: 2.0.0)
> If all values of a JSON field is null, JSON's inferSchema should return
> NullType instead of StringType
> ------------------------------------------------------------------------------------------------------
>
> Key: SPARK-12436
> URL: https://issues.apache.org/jira/browse/SPARK-12436
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Reynold Xin
> Labels: starter
>
> Right now, JSON's inferSchema will return {{StringType}} for a field that
> always has null values or an {{ArrayType(StringType)}} for a field that
> always has empty array values. Although this behavior makes writing JSON data
> to other data sources easy (i.e. when writing data, we do not need to remove
> those {{NullType}} or {{ArrayType(NullType)}} columns), it makes downstream
> application hard to reason about the actual schema of the data and thus makes
> schema merging hard. We should allow JSON's inferSchema returns {{NullType}}
> and {{ArrayType(NullType)}}. Also, we need to make sure that when we write
> data out, we should remove those {{NullType}} or {{ArrayType(NullType)}}
> columns first.
> Besides {{NullType}} and {{ArrayType(NullType)}}, we may need to do the same
> thing for empty {{StructType}}s (i.e. a {{StructType}} having 0 fields).
> To finish this work, we need to finish the following sub-tasks:
> * Allow JSON's inferSchema returns {{NullType}} and {{ArrayType(NullType)}}.
> * Determine whether we need to add the operation of removing {{NullType}} and
> {{ArrayType(NullType)}} columns from the data that will be write out for all
> data sources (i.e. data sources based our data source API and Hive tables).
> Or, we should just add this operation for certain data sources (e.g.
> Parquet). For example, we may not need this operation for Hive because Hive
> has VoidObjectInspector.
> * Implement the change and get it merged to Spark master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]