Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 3:

(18 comments)

You also need to add some analysis tests in AnalyzerTest.java (fn 
TestResetMetadata()).

http://gerrit.cloudera.org:8080/#/c/3385/3//COMMIT_MSG
Commit Message:

PS3, Line 13: query format
syntax


http://gerrit.cloudera.org:8080/#/c/3385/3/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

PS3, Line 190: 5: optional list<CatalogObjects.TPartitionKeyValue> 
partition_spec
Can you plz add a comment?


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS3, Line 35: //null if refreshing the whole table or invalidating the entire 
catalog.
Let's reverse this comment :) "not-null when ...."


PS3, Line 41: Preconditions.checkArgument(partitionSpec == null
            :         || (partitionSpec != null && name != null && 
!name.isEmpty()));
isn't this equivalent to partitionSpec == null || name != null? I believe the 
indention of this check should be that you can't have a partition spec without 
a valid table name, no?


PS3, Line 46: if (partitionSpec_ != null) {
            :       partitionSpec_.setTableName(tableName_);
            :     }
single line?


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS3, Line 1175: HdfsPartition hdfspartition = hdfsTable
              :         .getPartitionFromThriftPartitionSpec(partitionSpec);
              :     // For the case where partition does not exist in Catalog 
cache but might
              :     // exist in Hive Metastore
              :     String partitionName = hdfspartition == null
              :         ? 
hdfsTable.getPartitionNameFromThriftPartitionSpec(partitionSpec)
              :         : hdfspartition.getPartitionName();
You seem to be accessing partition info from a table without protecting it from 
other concurrent accesses (e.g. someone could be dropping the partition you're 
trying to refresh). I believe this section should be inside the synchronized 
block.


PS3, Line 1200: dropPartition(tbl, partitionSpec);
Why are you calling the dropPartition from this class and not the 
HdfsTable.dropPartition()?


PS3, Line 1208: dropPartition(tbl, partitionSpec);
              :         refreshedPartition = 
hdfsTable.createPartition(partition.getSd(), partition);
              :         addPartition(refreshedPartition);
You're exposing the process of refreshing a table partition to the reader of 
this class, which is not ideal. This task belongs to the table, so I would 
expect something like hdfsTable.refreshPartition(partition); call here.


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS3, Line 1921: this.
Remove 'this' from here and everywhere else, it's not needed.


PS3, Line 1927: kv.getValue() == null || kv.getValue().isEmpty()
You can probably use Guava's Strings.isNullOrEmpty(kv.getValue()) here


PS3, Line 1933: if(partitionCols.size()!=partitionVals.size()) return null;
Are you sure this is possible? Won't we catch it during the analysis of the 
PartitionSpec? If that's the case, we should convert this into a check.


Line 1936:         partitionCols,partitionVals);
nit: space after ','


http://gerrit.cloudera.org:8080/#/c/3385/3/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS3, Line 33: class TestRefreshPartition(ImpalaTestSuite):
Comment what this class is testing


Line 104: 
nit: remove empty line


PS3, Line 116: """
             :     Partitions added in Hive can be viewed in Impala after 
refreshing
             :     partition.
             :     """
nit: can you keep the comment formatting consistent (e.g. """blah wwfwfw"""


PS3, Line 137: Partitions
"Partition"


PS3, Line 158: test_add_data_and_refresh
Another interesting test is to create 2 partitions, insert data through hive to 
both of them, refresh one and ensure that only data in one partition is visible 
to impala.


Line 267:     assert result.data == [str(file_num_rows)]
You can also continue that test by removing the data file, calling refresh and 
validating that no rows get returned to Impala.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-HasComments: Yes

Reply via email to