Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1683: Allow REFRESH on a single partition ......................................................................
Patch Set 3: (18 comments) You also need to add some analysis tests in AnalyzerTest.java (fn TestResetMetadata()). http://gerrit.cloudera.org:8080/#/c/3385/3//COMMIT_MSG Commit Message: PS3, Line 13: query format syntax http://gerrit.cloudera.org:8080/#/c/3385/3/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: PS3, Line 190: 5: optional list<CatalogObjects.TPartitionKeyValue> partition_spec Can you plz add a comment? http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java: PS3, Line 35: //null if refreshing the whole table or invalidating the entire catalog. Let's reverse this comment :) "not-null when ...." PS3, Line 41: Preconditions.checkArgument(partitionSpec == null : || (partitionSpec != null && name != null && !name.isEmpty())); isn't this equivalent to partitionSpec == null || name != null? I believe the indention of this check should be that you can't have a partition spec without a valid table name, no? PS3, Line 46: if (partitionSpec_ != null) { : partitionSpec_.setTableName(tableName_); : } single line? http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: PS3, Line 1175: HdfsPartition hdfspartition = hdfsTable : .getPartitionFromThriftPartitionSpec(partitionSpec); : // For the case where partition does not exist in Catalog cache but might : // exist in Hive Metastore : String partitionName = hdfspartition == null : ? hdfsTable.getPartitionNameFromThriftPartitionSpec(partitionSpec) : : hdfspartition.getPartitionName(); You seem to be accessing partition info from a table without protecting it from other concurrent accesses (e.g. someone could be dropping the partition you're trying to refresh). I believe this section should be inside the synchronized block. PS3, Line 1200: dropPartition(tbl, partitionSpec); Why are you calling the dropPartition from this class and not the HdfsTable.dropPartition()? PS3, Line 1208: dropPartition(tbl, partitionSpec); : refreshedPartition = hdfsTable.createPartition(partition.getSd(), partition); : addPartition(refreshedPartition); You're exposing the process of refreshing a table partition to the reader of this class, which is not ideal. This task belongs to the table, so I would expect something like hdfsTable.refreshPartition(partition); call here. http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java: PS3, Line 1921: this. Remove 'this' from here and everywhere else, it's not needed. PS3, Line 1927: kv.getValue() == null || kv.getValue().isEmpty() You can probably use Guava's Strings.isNullOrEmpty(kv.getValue()) here PS3, Line 1933: if(partitionCols.size()!=partitionVals.size()) return null; Are you sure this is possible? Won't we catch it during the analysis of the PartitionSpec? If that's the case, we should convert this into a check. Line 1936: partitionCols,partitionVals); nit: space after ',' http://gerrit.cloudera.org:8080/#/c/3385/3/tests/metadata/test_refresh_partition.py File tests/metadata/test_refresh_partition.py: PS3, Line 33: class TestRefreshPartition(ImpalaTestSuite): Comment what this class is testing Line 104: nit: remove empty line PS3, Line 116: """ : Partitions added in Hive can be viewed in Impala after refreshing : partition. : """ nit: can you keep the comment formatting consistent (e.g. """blah wwfwfw""" PS3, Line 137: Partitions "Partition" PS3, Line 158: test_add_data_and_refresh Another interesting test is to create 2 partitions, insert data through hive to both of them, refresh one and ensure that only data in one partition is visible to impala. Line 267: assert result.data == [str(file_num_rows)] You can also continue that test by removing the data file, calling refresh and validating that no rows get returned to Impala. -- To view, visit http://gerrit.cloudera.org:8080/3385 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-HasComments: Yes
