[jira] [Updated] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

angerszhu (Jira) Wed, 25 May 2022 03:25:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


angerszhu updated SPARK-27442:
------------------------------
        Parent: SPARK-36200
    Issue Type: Sub-task  (was: Bug)

> ParquetFileFormat fails to read column named with invalid characters
> --------------------------------------------------------------------
>
>                 Key: SPARK-27442
>                 URL: https://issues.apache.org/jira/browse/SPARK-27442
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Input/Output
>    Affects Versions: 2.0.0, 2.4.1
>            Reporter: Jan Vršovský
>            Assignee: angerszhu
>            Priority: Minor
>             Fix For: 3.3.0
>
>
> When reading a parquet file which contains characters considered invalid, the 
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among " 
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read 
> it (and allow the user to correct it). However, possible workarounds (such as 
> using alias to rename the column, or forcing another schema) do not work, 
> since the check is done on the input.
> (Possible fix: remove superficial 
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from 
> {{buildReaderWithPartitionValues}} ?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-27442) ParquetFileFormat fails to read column named with invalid characters

Reply via email to