[jira] [Commented] (ARROW-11409) [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls

Ian Cook (Jira) Wed, 27 Jan 2021 14:15:06 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-11409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273173#comment-17273173
 ]


Ian Cook commented on ARROW-11409:
----------------------------------

FYI [~bryanc]

> [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal 
> nulls
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-11409
>                 URL: https://issues.apache.org/jira/browse/ARROW-11409
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 3.0.0
>            Reporter: Ian Cook
>            Priority: Minor
>
> While running integration tests with Arrow and Spark, I observed that Spark 
> 2.x can in some circumstances write Parquet files with illegal nulls in 
> non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow 
> throws an {{Unexpected end of stream}} error when attempting to read illegal 
> Parquet files like this.
> The attached Parquet file written by Spark 2.0.0 can be used to repro this 
> behavior. It contains only one column, a non-nullable integer named {{x}}, 
> with three records:
> {code:java}
> +-----+
> |    x|
> +-----+
> |    1|
> | null|
> |    3|
> +-----+ 
> {code}
> This issue is for awareness only. I expect this should be closed as "won't 
> fix".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11409) [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls

Reply via email to