[jira] [Updated] (ARROW-11409) [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls

Ian Cook (Jira) Wed, 27 Jan 2021 14:27:04 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ian Cook updated ARROW-11409:
-----------------------------
    Attachment: spark_2.0.0_illegal_null.parquet

> [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal 
> nulls
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-11409
>                 URL: https://issues.apache.org/jira/browse/ARROW-11409
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 3.0.0
>            Reporter: Ian Cook
>            Priority: Minor
>         Attachments: spark_2.0.0_illegal_null.parquet
>
>
> While running integration tests with Arrow and Spark, I observed that Spark 
> 2.x can in some circumstances write Parquet files with illegal nulls in 
> non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow 
> throws an {{Unexpected end of stream}} error when attempting to read illegal 
> Parquet files like this.
> The attached Parquet file written by Spark 2.0.0 can be used to repro this 
> behavior. It contains only one column, a non-nullable integer named {{x}}, 
> with three records:
> {code:java}
> +-----+
> |    x|
> +-----+
> |    1|
> | null|
> |    3|
> +-----+ 
> {code}
> This issue is for awareness only. I expect this should be closed as "won't 
> fix".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11409) [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls

Reply via email to