Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2840: Don't store table location in partition location
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/2355/2//COMMIT_MSG
Commit Message:

Line 9: For a table with location "ABC", most partitions will have locations
      : like "ABC/DEF=2". The "ABC" part of the location does not need to be
      : stored in Catalog for each partition; we can compress it down to one
      : bit in the common case.
I agree that right now this is the common case, but I am wondering if this is a 
short term solution. For example, think of use cases where a user keeps some 
partitions in hdfs and some in S3 or even in a different direction than the 
hdfs base dir. We will have to deal with the same problem again because no 
compression is achieved in these cases. Thoughts?


http://gerrit.cloudera.org:8080/#/c/2355/2/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 225: protion
portion


http://gerrit.cloudera.org:8080/#/c/2355/2/fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java:

Line 73: private boolean is_relative_;
       :     private String suffix_;
Make them both final


Line 76: HdfsTable table
What will happen if after a partition is created I do an alter table set 
location and change the table location to a different directory? Are we certain 
that we will not be generating invalid locations for the partitions when we 
call toString()? In general we need to make sure that alter locations for both 
tables and partitions don't result in stale metadata. We may have to add a few 
tests for these scenarios.


Line 86: Location(THdfsPartitionLocation thrift)
nit: we typically have static fromThrift() functions.


Line 119: public int compareTo(Location o) {
Can you add a comment on the imposed order (relative last, etc)?


Line 445: Location getRawLocation() 
There is a minor issue with this. I can call getRawLocation() on a partition 
and then call toString() and pass any HdfsTable object I want, producing a 
bogus partition path. One way to address this would be to change the Location 
class to be an inner class so that you don't have to pass HdfsTable in the 
toString() function because you already get access to the parent class at the 
expense of an additional reference for every location object. Thoughts?


Line 635: 
nit: why the empty lines?


Line 764: ? new Location(thriftPartition.getLocation())
        :         : null;
nit: can you merge into a single line?


Line 811: if (location_ != null) {
        :       thriftHdfsPart.setLocation(location_.toThrift());
        :     }
nit: if possible, single line


http://gerrit.cloudera.org:8080/#/c/2355/2/fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTestBase.java:

Line 245: THdfsTable table = findTable(entry.getKey());
        :           THdfsPartition partition = 
table.getPartitions().get(split.partition_id);
Isn't this equivalent to findPartition()?


-- 
To view, visit http://gerrit.cloudera.org:8080/2355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

Reply via email to