Takuya Ueshin created SPARK-3037:
------------------------------------
Summary: Add ArrayType containing null value support to Parquet.
Key: SPARK-3037
URL: https://issues.apache.org/jira/browse/SPARK-3037
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Takuya Ueshin
Priority: Blocker
Parquet support should handle {{ArrayType}} when {{containsNull}} is {{true}}.
When {{containsNull}} is {{true}}, the schema should be as follows:
{noformat}
message root {
optional group a (LIST) {
repeated group bag {
optional int32 array_element;
}
}
}
{noformat}
FYI:
Hive's Parquet writer *always* uses this schema, and reader can read only from
this schema, i.e. current Parquet support of SparkSQL is not compatible with
Hive.
NOTICE:
If Hive compatiblity is top priority, we also have to use this schma regardless
of {{containsNull}}, which will break backward compatibility.
But using this schema could affect performance.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]