Thomas Omans created PARQUET-465:
------------------------------------

             Summary: Parquet-Avro does not support field removal
                 Key: PARQUET-465
                 URL: https://issues.apache.org/jira/browse/PARQUET-465
             Project: Parquet
          Issue Type: Bug
          Components: parquet-avro
    Affects Versions: 1.8.0
            Reporter: Thomas Omans


Parquet avro does not support removal of fields, when used with the new 
compatibility layer:

Given a parquet file written with parquet avro at v1 and the following schema:

{code}
record FooBar {
  long foo;
  string bar;
}
{code}

And the following configuration settings:

{code}
job.getConfiguration.setBoolean(AvroReadSupport.AVRO_COMPATIBILITY, false)
AvroParquetInputFormat.setAvroReadSchema(job, avroReaderSchema)
{code}

A job fails when trying to read it using schema version v2:

{code}
record FooBar {
  string bar;
}
{code}

With the error:

{code}
org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: 
Avro field 'foo' not found
        at 
org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:159)
{code}

It looks like because it sees the field in the original version it assumes the 
new version must expect it, but this case just means that the field was 
removed. Avro schema resolution dictates that you just ignore this field, since 
it is not relevant in the new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to