E. Sammer created CRUNCH-480:
--------------------------------

             Summary: AvroParquetFileSource doesn't properly configure 
user-supplied read schema
                 Key: CRUNCH-480
                 URL: https://issues.apache.org/jira/browse/CRUNCH-480
             Project: Crunch
          Issue Type: Bug
          Components: IO
    Affects Versions: 0.10.0
            Reporter: E. Sammer
            Priority: Blocker


It seems like AvroParquetFileSource doesn't properly set the configuration 
param required to use a user-supplied read schema that differs from the schema 
in the file.

Deep in the guts of Parquet (InternalParquetReader#initialize()), I found this:
{code}
   this.recordConverter = readSupport.prepareForRead(
        configuration, extraMetadata, fileSchema,
        new ReadSupport.ReadContext(requestedSchema, readSupportMetadata));
{code}

Later, in Parquet's AvroReadSupport#prepareForRead(), it appears to ignore the 
supplied requestedSchema and, instead, looks for the key avro.read.schema in 
the readSupportMetadata map. This is seriously kookie code in Parquet (i.e. 
wrong), but because Crunch doesn't supply readSupportMetadata, we can never 
properly supply a read schema. Boooo hisssss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to