mike created SPARK-36983:
----------------------------
Summary: ignoreCorruptFiles does work when schema change from int
to string
Key: SPARK-36983
URL: https://issues.apache.org/jira/browse/SPARK-36983
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.1.2, 2.4.8
Reporter: mike
Precondition:
In folder A having two parquet files
* File 1: have some columns and one of them is column X with data type Int
* File 2: Same schema with File 1 except column X having data type String
Read file 1 to get schema of file 1.
Read folder A with schema of file 1.
Expected: Read successfully, file 2 will be ignored as the data type of column
X changed to string.
Actual: File 2 seems to be not ignored and get error:
`WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78
executor driver): java.lang.UnsupportedOperationException:
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor
driver): java.lang.UnsupportedOperationException:
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]