Takuya Ueshin created SPARK-3037:
------------------------------------

             Summary: Add ArrayType containing null value support to Parquet.
                 Key: SPARK-3037
                 URL: https://issues.apache.org/jira/browse/SPARK-3037
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Takuya Ueshin
            Priority: Blocker


Parquet support should handle {{ArrayType}} when {{containsNull}} is {{true}}.

When {{containsNull}} is {{true}}, the schema should be as follows:

{noformat}
message root {
  optional group a (LIST) {
    repeated group bag {
      optional int32 array_element;
    }
  }
}
{noformat}

FYI:
Hive's Parquet writer *always* uses this schema, and reader can read only from 
this schema, i.e. current Parquet support of SparkSQL is not compatible with 
Hive.

NOTICE:
If Hive compatiblity is top priority, we also have to use this schma regardless 
of {{containsNull}}, which will break backward compatibility.
But using this schema could affect performance.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to