[Impala-ASF-CR] IMPALA-1654: Partition expr in DDL operations.

Alex Behm (Code Review) Fri, 02 Sep 2016 16:24:23 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-1654: Partition expr in DDL operations.
......................................................................



Patch Set 4:

(32 comments)

I'll take another pass over the tests once we're in good shape with the code.

General question: There are few tasks related to this patch that I think need 
follow-on work. Are you willing to work on them? The tasks are as follows:

- Use PartitionSet in REFRESH <tbl> PARTITION ()
- IMPALA-4033
- Use Metastore Bulk API for dropping several partitions

http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java
File 
fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java:

Line 42:   public static final int NUM_PARTITION_LOG_LIMIT = 3;
private


Line 86:             num++;
style: ++num


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ComputeStatsStmt.java:

Line 360:         for(HdfsPartition targetPartition: targetPartitions){
style: space after "for"


Line 530:              partitionSet_ == null ? "" : partitionSet_.toSql();
style: indent 4 from parent (like it was before)


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java
File fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java:

Line 33:   // Set during analysis
// Result of analysis


Line 44:     Preconditions.checkNotNull(tableName_);
please move the analysis code that is common between PartitionSet and 
PartitionSpec into PartitionSpecBase.analyze()


Line 46:     analyzer.setEnablePrivChecks(false);
needs a brief comment like: Do not register column-authorization requests.

also this should be done as close as possible to where the auth requests would 
be registered, i.e. right before L63


Line 65:       // SlotRef should be registered before getting slot ids
remove, should be clear why analysis is needed


Line 80:     // Make sure every conjunct only contains partition slot refs.
Would be cleaner to check this with conjunct.isBoundBySlotIds() like in the 
PartitionPruner.

Here are the relevant snippets from PartitionPruner:

Might be worth moving this slot collection into a helper function in 
TupleDescriptor or Analyzer.

// Collect all the partitioning columns from TupleDescriptor.
for (SlotDescriptor slotDesc: tupleDesc.getSlots()) {
  if (slotDesc.getColumn() == null) continue;
  if (slotDesc.getColumn().getPosition() < tbl_.getNumClusteringCols()) {
    partitionSlots_.add(slotDesc.getId());
  }
}

Then the check looks like this:
if (!conjunct.isBoundBySlotIds(partitionSlots_)) {
 // report error and print offending conjunct
}


Line 117:   // just as before, while more general partition expressions or 
partially-specified
just as before -> for backwards compatibility


PS4, Line 118:  
simply discard -> ignore

the comment should describe side effects, i.e., that partitionShouldExists_ is 
set to null if it should be ignores


Line 119:   private void CheckIgnoreIfExists(Table table, List<Expr> 
transformedConjuncts) {
checkIgnoreIfExists()


Line 121:     Expr logExpr = null;
don't think we really need to log any offending expr


Line 122:     Set<String> columnNames = Sets.newHashSet();
partColNames


Line 125:       if (Predicate.isEquivalencePredicate(e)) {
not quite correct: it must be an equality predicate

the current code also allows <=>

probably simpler to check instanceof BinaryPredicate


Line 128:           
columnNames.add(slotRef.getRef().getDesc().getColumn().getName());
add a Preconditions check to show that slotRef must belong to a partition column


Line 136:         
columnNames.add(nullPredicate.getBoundSlot().getDesc().getColumn().getName());
add same Preconditions check here


Line 146:       LOG.info(String.format(
Instead of this logging I think we should analyzer.addWarning(), but only if 
the user actually specified IF EXISTS in the query.


Line 149:     } else if (columnNames.size() < 
table.getMetaStoreTable().getPartitionKeysSize()) {
use table.getNumClusteringCols()


Line 150:       // we don't need to check the non-partition columns here since 
it's already been
remove comment, should be clear after you've made the other changes above


Line 169:       if (Predicate.isEquivalencePredicate(e)) {
this will also transform <=> predicates, let's limit it to = predicates only


Line 202:         String key = ToSqlUtils.getIdentSql(hdfsTable.getColumns()
Why do we need to use getIdentSql() here? Are you sure we need to quote 
identifiers?

For example, a column like "table" will be converted to "`table`" because 
"table" is an Impala keyword. I don't think we want to do that conversion here.


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/analysis/PartitionSpecBase.java
File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpecBase.java:

Line 1: package com.cloudera.impala.analysis;
Apache copyright


Line 9:  * Created by amos on 8/18/16.
Replace this with a brief description of what this class is.


Line 45:   // Set the privilege requirement for this partition spec. Must be 
set prior to
newline before


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 893:    * CatalogException if 'tbl' is not an HdfsTable. If the partitions 
in the given
update comment: null is never returned


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1021:       if (hdfsPartition != null) {
single line if it fits


http://gerrit.cloudera.org:8080/#/c/3942/4/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 1030:         if (partitions == null) {
getPartitionsFromPartitionSet() will never return null so I think you want to 
check isEmpty() instead

we should eventually address the TODO to return info to the user, but not in 
this patch (already big enough :) )


Line 1036:         for(HdfsPartition partition: partitions) {
style: space after "for"


Line 1748:    * permanently deleted.
I think we should definitely move to the bulk API before we include this change 
in a release. However, I'm ok with doing that in a separate patch.

I looked at the API and it is indeed a little strange.

If we defer this task, are you willing to make the follow-on changes? If so, 
please add a TODO here.

I can file JIRAs for the various follow-on tasks and it would be awesome if you 
wanted to work on some of them :)


Line 1758:     if (!ifExists) {
Let's change this logic and add Preconditions checks to reflect our new 
constraints, e.g.: we can only have ifExists if there is exactly one partition 
in the partition set.

If the partition set is empty, we can definitely not have ifExists.


http://gerrit.cloudera.org:8080/#/c/3942/4/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 251: 'Updated 1 table.'
remove the "1" since it may be misleading (we cannot update more than one table 
at once)


-- 
To view, visit http://gerrit.cloudera.org:8080/3942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Amos Bird <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Amos Bird <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1654: Partition expr in DDL operations.

Reply via email to