Ian Cook created ARROW-11409:
--------------------------------
Summary: [Integration] Enable Arrow to read Parquet files from
Spark 2.x with illegal nulls
Key: ARROW-11409
URL: https://issues.apache.org/jira/browse/ARROW-11409
Project: Apache Arrow
Issue Type: Bug
Components: Integration
Affects Versions: 3.0.0
Reporter: Ian Cook
While running integration tests with Arrow and Spark, I observed that Spark 2.x
can in some circumstances write Parquet files with illegal nulls in
non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow
throws an {{Unexpected end of stream}} error when attempting to read illegal
Parquet files like this.
The attached Parquet file written by Spark 2.0.0 can be used to repro this
behavior. It contains only one column, a non-nullable integer named {{x}}, with
three records:
{code:java}
+-----+
| x|
+-----+
| 1|
| null|
| 3|
+-----+
{code}
This issue is for awareness only. I expect this should be closed as "won't fix".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)