[
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Szadovszky updated PARQUET-1681:
--------------------------------------
Fix Version/s: (was: 1.11.0)
> Avro's isElementType() change breaks the reading of some parquet(1.8.1) files
> -----------------------------------------------------------------------------
>
> Key: PARQUET-1681
> URL: https://issues.apache.org/jira/browse/PARQUET-1681
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-avro
> Affects Versions: 1.10.0, 1.9.1, 1.11.0
> Reporter: Xinli Shang
> Priority: Critical
>
> When using the Avro schema below to write a parquet(1.8.1) file and then read
> back by using parquet 1.10.1 without passing any schema, the reading throws
> an exception "XXX is not a group" . Reading through parquet 1.8.1 is fine.
> {
> "name": "phones",
> "type": [
> "null",
> {
> "type": "array",
> "items": {
> "type": "record",
> "name": "phones_items",
> "fields": [
>
> { "name": "phone_number",
> "type": [ "null",
> "string" ], "default": null
> }
> ]
> }
> }
> ],
> "default": null
> }
> The code to read is as below
> val reader =
> AvroParquetReader._builder_[SomeRecordType](parquetPath).withConf(*new*
> Configuration).build()
> reader.read()
> PARQUET-651 changed the method isElementType() by relying on Avro's
> checkReaderWriterCompatibility() to check the compatibility. However,
> checkReaderWriterCompatibility() consider the ParquetSchema and the
> AvroSchema(converted from File schema) as not compatible(the name in avro
> schema is ‘phones_items’, but the name is ‘array’ in Parquet schema, hence
> not compatible) . Hence return false and caused the “phone_number” field in
> the above schema to be considered as group type which is not true. Then the
> exception throws as .asGroupType().
> I didn’t try writing via parquet 1.10.1 would reproduce the same problem or
> not. But it could because the translation of Avro schema to Parquet schema is
> not changed(didn’t verify yet).
> I hesitate to revert PARQUET-651 because it solved several problems. I would
> like to hear the community's thoughts on it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)