Alex Kozlov created CRUNCH-310:
----------------------------------

             Summary: There should be a way to specify projection schema for 
Parquet files
                 Key: CRUNCH-310
                 URL: https://issues.apache.org/jira/browse/CRUNCH-310
             Project: Crunch
          Issue Type: Improvement
          Components: IO
            Reporter: Alex Kozlov
            Priority: Critical


Currently the projection schema is set based on the ptype:

{code}
 private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S> 
ptype) {
    return FormatBundle.forInput(AvroParquetInputFormat.class)
        .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, 
ptype.getSchema().toString())
        // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
        // doesn't work with CombineFileInputFormat
        .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
  }
{code}

Sometimes a user wants a subset of columns as a projection.  Need a mechanism 
to supply desired projection schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to