[
https://issues.apache.org/jira/browse/SPARK-27442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400478#comment-17400478
]
Dror Speiser commented on SPARK-27442:
--------------------------------------
Hey, I'm going over the parquet format specification (github page and thrift
file), and I don't see any mention of valid or invalid characters for field
names in schema elements. Was this a restriction in earlier format
specifications?
> ParquetFileFormat fails to read column named with invalid characters
> --------------------------------------------------------------------
>
> Key: SPARK-27442
> URL: https://issues.apache.org/jira/browse/SPARK-27442
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 2.0.0, 2.4.1
> Reporter: Jan Vršovský
> Priority: Minor
>
> When reading a parquet file which contains characters considered invalid, the
> reader fails with exception:
> Name: org.apache.spark.sql.AnalysisException
> Message: Attribute name "..." contains invalid character(s) among "
> ,;{}()\n\t=". Please use alias to rename it.
> Spark should not be able to write such files, but it should be able to read
> it (and allow the user to correct it). However, possible workarounds (such as
> using alias to rename the column, or forcing another schema) do not work,
> since the check is done on the input.
> (Possible fix: remove superficial
> {{ParquetWriteSupport.setSchema(requiredSchema, hadoopConf)}} from
> {{buildReaderWithPartitionValues}} ?)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]