mike created SPARK-36983:
----------------------------

             Summary: ignoreCorruptFiles does work when schema change from int 
to string
                 Key: SPARK-36983
                 URL: https://issues.apache.org/jira/browse/SPARK-36983
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.1.2, 2.4.8
            Reporter: mike


Precondition:

In folder A having two parquet files
 * File 1: have some columns and one of them is column X with data type Int
 * File 2: Same schema with File 1 except column X  having data type String

Read file 1 to get schema of file 1.

Read folder A with schema of file 1.

Expected: Read successfully, file 2 will be ignored as the data type of column 
X changed to string.

Actual: File 2 seems to be not ignored and get error:

 `WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 
executor driver): java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 executor 
driver): java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
 at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)`

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to