Thomas Omans created PARQUET-465:
------------------------------------
Summary: Parquet-Avro does not support field removal
Key: PARQUET-465
URL: https://issues.apache.org/jira/browse/PARQUET-465
Project: Parquet
Issue Type: Bug
Components: parquet-avro
Affects Versions: 1.8.0
Reporter: Thomas Omans
Parquet avro does not support removal of fields, when used with the new
compatibility layer:
Given a parquet file written with parquet avro at v1 and the following schema:
{code}
record FooBar {
long foo;
string bar;
}
{code}
And the following configuration settings:
{code}
job.getConfiguration.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, false)
AvroParquetInputFormat.setAvroReadSchema(job, avroReaderSchema)
{code}
A job fails when trying to read it using schema version v2:
{code}
record FooBar {
string bar;
}
{code}
With the error:
{code}
org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch:
Avro field 'foo' not found
at
org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:159)
{code}
It looks like because it sees the field in the original version it assumes the
new version must expect it, but this case just means that the field was
removed. Avro schema resolution dictates that you just ignore this field, since
it is not relevant in the new version.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)