Vihang Karajgaonkar created HIVE-17696:
------------------------------------------

             Summary: Vectorized reader does not seem to be pushing down 
projection columns in certain code paths
                 Key: HIVE-17696
                 URL: https://issues.apache.org/jira/browse/HIVE-17696
             Project: Hive
          Issue Type: Sub-task
            Reporter: Vihang Karajgaonkar


This is the code snippet from {{VectorizedParquetRecordReader.java}}

{noformat}

MessageType tableSchema;
    if (indexAccess) {
      List<Integer> indexSequence = new ArrayList<>();

      // Generates a sequence list of indexes
      for(int i = 0; i < columnNamesList.size(); i++) {
        indexSequence.add(i);
      }

      tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, 
columnNamesList,
        indexSequence);
    } else {
      tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, 
columnNamesList,
        columnTypesList);
    }

    indexColumnsWanted = ColumnProjectionUtils.getReadColumnIDs(configuration);
    if (!ColumnProjectionUtils.isReadAllColumns(configuration) && 
!indexColumnsWanted.isEmpty()) {
      requestedSchema =
        DataWritableReadSupport.getSchemaByIndex(tableSchema, columnNamesList, 
indexColumnsWanted);
    } else {
      requestedSchema = fileSchema;
    }

    this.reader = new ParquetFileReader(
      configuration, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());

{noformat}

Couple of things to notice here:

Most of this code is duplicated from {{DataWritableReadSupport.init()}} method. 
the else condition passes in fileSchema instead of using tableSchema like we do 
in DataWritableReadSupport.init() method. Does this cause projection columns to 
be missed when we read parquet files? We should probably just reuse ReadContext 
returned from {{DataWritableReadSupport.init()}} method here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to