Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2840: Don't store table location in partition location ......................................................................
Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/2355/2//COMMIT_MSG Commit Message: Line 9: For a table with location "ABC", most partitions will have locations : like "ABC/DEF=2". The "ABC" part of the location does not need to be : stored in Catalog for each partition; we can compress it down to one : bit in the common case. I agree that right now this is the common case, but I am wondering if this is a short term solution. For example, think of use cases where a user keeps some partitions in hdfs and some in S3 or even in a different direction than the hdfs base dir. We will have to deal with the same problem again because no compression is achieved in these cases. Thoughts? http://gerrit.cloudera.org:8080/#/c/2355/2/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 225: protion portion http://gerrit.cloudera.org:8080/#/c/2355/2/fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java: Line 73: private boolean is_relative_; : private String suffix_; Make them both final Line 76: HdfsTable table What will happen if after a partition is created I do an alter table set location and change the table location to a different directory? Are we certain that we will not be generating invalid locations for the partitions when we call toString()? In general we need to make sure that alter locations for both tables and partitions don't result in stale metadata. We may have to add a few tests for these scenarios. Line 86: Location(THdfsPartitionLocation thrift) nit: we typically have static fromThrift() functions. Line 119: public int compareTo(Location o) { Can you add a comment on the imposed order (relative last, etc)? Line 445: Location getRawLocation() There is a minor issue with this. I can call getRawLocation() on a partition and then call toString() and pass any HdfsTable object I want, producing a bogus partition path. One way to address this would be to change the Location class to be an inner class so that you don't have to pass HdfsTable in the toString() function because you already get access to the parent class at the expense of an additional reference for every location object. Thoughts? Line 635: nit: why the empty lines? Line 764: ? new Location(thriftPartition.getLocation()) : : null; nit: can you merge into a single line? Line 811: if (location_ != null) { : thriftHdfsPart.setLocation(location_.toThrift()); : } nit: if possible, single line http://gerrit.cloudera.org:8080/#/c/2355/2/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java: Line 245: THdfsTable table = findTable(entry.getKey()); : THdfsPartition partition = table.getPartitions().get(split.partition_id); Isn't this equivalent to findPartition()? -- To view, visit http://gerrit.cloudera.org:8080/2355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-HasComments: Yes
