[
https://issues.apache.org/jira/browse/ARROW-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Cook updated ARROW-11409:
-----------------------------
Attachment: spark_2.0.0_illegal_null.parquet
> [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal
> nulls
> ----------------------------------------------------------------------------------
>
> Key: ARROW-11409
> URL: https://issues.apache.org/jira/browse/ARROW-11409
> Project: Apache Arrow
> Issue Type: Bug
> Components: Integration
> Affects Versions: 3.0.0
> Reporter: Ian Cook
> Priority: Minor
> Attachments: spark_2.0.0_illegal_null.parquet
>
>
> While running integration tests with Arrow and Spark, I observed that Spark
> 2.x can in some circumstances write Parquet files with illegal nulls in
> non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow
> throws an {{Unexpected end of stream}} error when attempting to read illegal
> Parquet files like this.
> The attached Parquet file written by Spark 2.0.0 can be used to repro this
> behavior. It contains only one column, a non-nullable integer named {{x}},
> with three records:
> {code:java}
> +-----+
> | x|
> +-----+
> | 1|
> | null|
> | 3|
> +-----+
> {code}
> This issue is for awareness only. I expect this should be closed as "won't
> fix".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)