Ian Cook created ARROW-11409:
--------------------------------

             Summary: [Integration] Enable Arrow to read Parquet files from 
Spark 2.x with illegal nulls
                 Key: ARROW-11409
                 URL: https://issues.apache.org/jira/browse/ARROW-11409
             Project: Apache Arrow
          Issue Type: Bug
          Components: Integration
    Affects Versions: 3.0.0
            Reporter: Ian Cook


While running integration tests with Arrow and Spark, I observed that Spark 2.x 
can in some circumstances write Parquet files with illegal nulls in 
non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow 
throws an {{Unexpected end of stream}} error when attempting to read illegal 
Parquet files like this.

The attached Parquet file written by Spark 2.0.0 can be used to repro this 
behavior. It contains only one column, a non-nullable integer named {{x}}, with 
three records:
{code:java}
+-----+
|    x|
+-----+
|    1|
| null|
|    3|
+-----+ 
{code}
This issue is for awareness only. I expect this should be closed as "won't fix".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to