Alex Behm has posted comments on this change. Change subject: IMPALA-1654: Partition expr in DDL operations. ......................................................................
Patch Set 4: (32 comments) I'll take another pass over the tests once we're in good shape with the code. General question: There are few tasks related to this patch that I think need follow-on work. Are you willing to work on them? The tasks are as follows: - Use PartitionSet in REFRESH <tbl> PARTITION () - IMPALA-4033 - Use Metastore Bulk API for dropping several partitions http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java File fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java: Line 42: public static final int NUM_PARTITION_LOG_LIMIT = 3; private Line 86: num++; style: ++num http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/com/cloudera/impala/analysis/ComputeStatsStmt.java: Line 360: for(HdfsPartition targetPartition: targetPartitions){ style: space after "for" Line 530: partitionSet_ == null ? "" : partitionSet_.toSql(); style: indent 4 from parent (like it was before) http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java File fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java: Line 33: // Set during analysis // Result of analysis Line 44: Preconditions.checkNotNull(tableName_); please move the analysis code that is common between PartitionSet and PartitionSpec into PartitionSpecBase.analyze() Line 46: analyzer.setEnablePrivChecks(false); needs a brief comment like: Do not register column-authorization requests. also this should be done as close as possible to where the auth requests would be registered, i.e. right before L63 Line 65: // SlotRef should be registered before getting slot ids remove, should be clear why analysis is needed Line 80: // Make sure every conjunct only contains partition slot refs. Would be cleaner to check this with conjunct.isBoundBySlotIds() like in the PartitionPruner. Here are the relevant snippets from PartitionPruner: Might be worth moving this slot collection into a helper function in TupleDescriptor or Analyzer. // Collect all the partitioning columns from TupleDescriptor. for (SlotDescriptor slotDesc: tupleDesc.getSlots()) { if (slotDesc.getColumn() == null) continue; if (slotDesc.getColumn().getPosition() < tbl_.getNumClusteringCols()) { partitionSlots_.add(slotDesc.getId()); } } Then the check looks like this: if (!conjunct.isBoundBySlotIds(partitionSlots_)) { // report error and print offending conjunct } Line 117: // just as before, while more general partition expressions or partially-specified just as before -> for backwards compatibility PS4, Line 118: simply discard -> ignore the comment should describe side effects, i.e., that partitionShouldExists_ is set to null if it should be ignores Line 119: private void CheckIgnoreIfExists(Table table, List<Expr> transformedConjuncts) { checkIgnoreIfExists() Line 121: Expr logExpr = null; don't think we really need to log any offending expr Line 122: Set<String> columnNames = Sets.newHashSet(); partColNames Line 125: if (Predicate.isEquivalencePredicate(e)) { not quite correct: it must be an equality predicate the current code also allows <=> probably simpler to check instanceof BinaryPredicate Line 128: columnNames.add(slotRef.getRef().getDesc().getColumn().getName()); add a Preconditions check to show that slotRef must belong to a partition column Line 136: columnNames.add(nullPredicate.getBoundSlot().getDesc().getColumn().getName()); add same Preconditions check here Line 146: LOG.info(String.format( Instead of this logging I think we should analyzer.addWarning(), but only if the user actually specified IF EXISTS in the query. Line 149: } else if (columnNames.size() < table.getMetaStoreTable().getPartitionKeysSize()) { use table.getNumClusteringCols() Line 150: // we don't need to check the non-partition columns here since it's already been remove comment, should be clear after you've made the other changes above Line 169: if (Predicate.isEquivalencePredicate(e)) { this will also transform <=> predicates, let's limit it to = predicates only Line 202: String key = ToSqlUtils.getIdentSql(hdfsTable.getColumns() Why do we need to use getIdentSql() here? Are you sure we need to quote identifiers? For example, a column like "table" will be converted to "`table`" because "table" is an Impala keyword. I don't think we want to do that conversion here. http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/PartitionSpecBase.java File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpecBase.java: Line 1: package com.cloudera.impala.analysis; Apache copyright Line 9: * Created by amos on 8/18/16. Replace this with a brief description of what this class is. Line 45: // Set the privilege requirement for this partition spec. Must be set prior to newline before http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: Line 893: * CatalogException if 'tbl' is not an HdfsTable. If the partitions in the given update comment: null is never returned http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java: Line 1021: if (hdfsPartition != null) { single line if it fits http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 1030: if (partitions == null) { getPartitionsFromPartitionSet() will never return null so I think you want to check isEmpty() instead we should eventually address the TODO to return info to the user, but not in this patch (already big enough :) ) Line 1036: for(HdfsPartition partition: partitions) { style: space after "for" Line 1748: * permanently deleted. I think we should definitely move to the bulk API before we include this change in a release. However, I'm ok with doing that in a separate patch. I looked at the API and it is indeed a little strange. If we defer this task, are you willing to make the follow-on changes? If so, please add a TODO here. I can file JIRAs for the various follow-on tasks and it would be awesome if you wanted to work on some of them :) Line 1758: if (!ifExists) { Let's change this logic and add Preconditions checks to reflect our new constraints, e.g.: we can only have ifExists if there is exactly one partition in the partition set. If the partition set is empty, we can definitely not have ifExists. http://gerrit.cloudera.org:8080/#/c/3942/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 251: 'Updated 1 table.' remove the "1" since it may be misleading (we cannot update more than one table at once) -- To view, visit http://gerrit.cloudera.org:8080/3942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Amos Bird <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Amos Bird <[email protected]> Gerrit-HasComments: Yes
