Sergio Peña created HIVE-11611: ---------------------------------- Summary: A bad performance regression issue with Parquet happens if Hive does not select any columns Key: HIVE-11611 URL: https://issues.apache.org/jira/browse/HIVE-11611 Project: Hive Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Sergio Peña Assignee: Ferdinand Xu
A possible performance issue may happen with the below code when using a query like this {{SELECT count(1) FROM parquetTable}}. {code} if (!ColumnProjectionUtils.isReadAllColumns(configuration) && !indexColumnsWanted.isEmpty()) { MessageType requestedSchemaByUser = getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted); return new ReadContext(requestedSchemaByUser, contextMetadata); } else { return new ReadContext(tableSchema, contextMetadata); } {code} If there are not columns nor indexes selected, then the above code will read the full schema from Parquet even if Hive does not do anything with such values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)