Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2840: Don't store table location in partition location ......................................................................
Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/2355/5/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 38: namespace Out of curiosity, why don't you put this function under impala namespace? Line 41: // format that references the 'partition_prefixes' of 'thrift_table'. Can you also comment on the return value when location isn't set? http://gerrit.cloudera.org:8080/#/c/2355/5/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 223: HDFS partition's location There is no reason why we can't use this to compress other locations as well. For example: hdfs://host_name/my_database/table1 and hdfs://host_name/my_database/tabl2 share a pretty long prefix. Maybe leave a TODO somewhere to generalize this functionality. http://gerrit.cloudera.org:8080/#/c/2355/5/fe/src/main/java/com/cloudera/impala/catalog/HdfsPartitionLocationCompressor.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsPartitionLocationCompressor.java: Line 28: This class is not thread-safe. Clients of this class need to protect against : * concurrent updates using external locking. : * : * Owned by HdfsTable instance. Since this is owned by HdfsTable and the catalog uses table-level locking, we don't need to synchronize accesses to instances of this class. Can you update the comment? Line 35: numClusteringColumns_ This doesn't seem to be used anywhere besides getter and setter. Why storing it in this class? Line 39: indexToPrefix This is a bit misleading as it sort of indicates that this is a map. Maybe call it prefixes_ instead? Line 88: public class Location How about the following idea for organizing this code: 1. HdfsPartitionLocationCompressor has a compress method that takes a string (partition location) and returns a Pair<Integer, String> representing a prefix-suffix pair. 2. It also has a decompress method that converts a Pair<Integer, String> into a partition location string. 3. Location class is deleted. Each partition can simply store the prefix-suffix pair. HdfsPartition has two util methods compress/decompress that simply delegate the work to the parent table object. -- To view, visit http://gerrit.cloudera.org:8080/2355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
