Jim Apple has uploaded a new patch set (#5). Change subject: IMPALA-2840: Don't store table location in partition location ......................................................................
IMPALA-2840: Don't store table location in partition location For a table with location "ABC", most partitions will have locations like "ABC/DEF=2". The "ABC" part of the location does not need to be stored in Catalog for each partition; we can compress it down to one int in the common case. This is done by stripping from each partition location the last N directories (where N is the number of clustering columns) and storing the resulting string in a cache of partition location prefixes. In the cache, this location prefix string is mapped to an int. Partition locations are then stored as a tuple consisting of that int and a suffix string; the partition location can be reconstructed as the concatenation of the prefix string (from the cache) and the suffix. Though this scheme was designed in the expectation that most partitions will be stored in directories like "/part_col_1=1.23/part_col_2=234/", it works even when that is not the case. TODO: Since each partition stores the literal values for the partitioning columns, we could also elide the column names and values when partitions are placed in directories like "/part_col_1=1.23/part_col_2=234/" Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb --- M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/main/java/com/cloudera/impala/analysis/LoadDataStmt.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java A fe/src/main/java/com/cloudera/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java M fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py 11 files changed, 392 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/55/2355/5 -- To view, visit http://gerrit.cloudera.org:8080/2355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]>
