[
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
angerszhu updated SPARK-27442:
------------------------------
Parent: SPARK-36200
Issue Type: Sub-task (was: Bug)
> ParquetFileFormat fails to read column named with invalid characters
> --------------------------------------------------------------------
>
> Key: SPARK-27442
> URL: https://issues.apache.org/jira/browse/SPARK-27442
> Project: Spark
> Issue Type: Sub-task
> Components: Input/Output
> Affects Versions: 2.0.0, 2.4.1
> Reporter: Jan Vršovský
> Assignee: angerszhu
> Priority: Minor
> Fix For: 3.3.0
>
>
> When reading a parquet file which contains characters considered invalid, the
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among "
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read
> it (and allow the user to correct it). However, possible workarounds (such as
> using alias to rename the column, or forcing another schema) do not work,
> since the check is done on the input.
> (Possible fix: remove superficial
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from
> {{buildReaderWithPartitionValues}} ?)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]