[
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475894#comment-17475894
]
Wenchen Fan commented on SPARK-27442:
-------------------------------------
I think we should fix this. It's OK for Spark to forbid special chars in the
column name, but when we read existing parquet files, there is no point to
forbid it at the Spark side.
[~angerszhuuu] can you take a look? Thanks!
> ParquetFileFormat fails to read column named with invalid characters
> --------------------------------------------------------------------
>
> Key: SPARK-27442
> URL: https://issues.apache.org/jira/browse/SPARK-27442
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.0.0, 2.4.1
> Reporter: Jan Vršovský
> Priority: Minor
>
> When reading a parquet file which contains characters considered invalid, the
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among "
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read
> it (and allow the user to correct it). However, possible workarounds (such as
> using alias to rename the column, or forcing another schema) do not work,
> since the check is done on the input.
> (Possible fix: remove superficial
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from
> {{buildReaderWithPartitionValues}} ?)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]