Alex Behm has posted comments on this change. Change subject: IMPALA-1654: Partition expr in DDL operations. ......................................................................
Patch Set 1: (36 comments) http://gerrit.cloudera.org:8080/#/c/3942/1/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: Line 937: void ImpalaServer::QueryExecState::SetResultSet() { make const TDdlExecResponse* a param to this function to make it clear when it makes sense to call this http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 897: // Need to check partition is an opt_partition_spec in analyzer to avoid suggest minor rephrase: // Need to check in analysis that the partition set only matches a single partition. Avoids a reduce/reduce conflict and allows the user to select a partition without fully specifying all partition-key values. Line 1562: // A partition spec is a set of expressions used to select a list of partitions Update comment (still says 'partition spec') http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java File fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetLocationStmt.java: Line 79: "can only match one partition."); in the error message include the names of the matched partitions http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetStmt.java File fe/src/main/java/com/cloudera/impala/analysis/AlterTableSetStmt.java: Line 37: PartitionSet getPartitionSet() { single line http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java: Line 442: analysisResult_.isAlterTableStmt() || Please take a look at my comment on the previous patch set. Summary: We should remove these checks and temporarily disable auth when analyzing a PartitionSet. http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java File fe/src/main/java/com/cloudera/impala/analysis/PartitionSet.java: Line 20: * Represents a partition set - a collection of partition expressions. Represents a set of partitions resulting from evaluating a list of partition conjuncts against a table's partition list. Line 31: // does or does not contain any partition matched by the given partition expr. exprs Line 34: private Privilege privilegeRequirement_; There are a bunch of common members and analysis code between PartitionSpec and PartitionSet. Some of the analysis code is subtle, so it would be good to factor out the commonalities into a PartitionSpecBase. Line 44: public String getTbl() { single lines where possible Line 93: "Dynamic partition spec is not supported."); Error is somewhat cryptic. It should contain: 1. The offending expression 2. An understandable explanation for the failure and what the expected input is Maybe something like the following, you get the idea: Invalid column reference 'expr.toSql() goes here' in partition spec. A partition spec may only contain predicates. Line 99: // Verify the partition exprs. Make sure every conjunct only contains exactly Update comment. Line 103: for (Expr e : conjuncts) { We still need to check that e only references partition columns - and no non-partition columns. Also add a test for that. Line 105: if (e.isLiteral()) { I think it's reasonable to disallow constant expressions here, so you can just check e.isConstant(). In that case I think we should throw rather than just issue a warning. Line 132: // Transform <COL> = NULL into IsNull expr; <String COL> = '' into IsNull expr and The NULL transformation looks good, but I don't think the string transformation is necessary or correct. The partition pruner should already be able to handle the string_col = '' case, and string partition-key values are case sensitive, so it's not correct to lower case them. http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/catalog/Catalog.java File fe/src/main/java/com/cloudera/impala/catalog/Catalog.java: Line 390: public List<HdfsPartition> getHdfsPartitions(Table tbl, do we really need this function? seems like the caller could do tbl.getPartitionsFromPartitionSet() directly http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java: Line 892: * Drops the partition specified in 'partitionSpec' from 'tbl'. Throws a Update comment Line 897: public Table dropPartitions(Table tbl, List<List<TPartitionKeyValue>> partitionIds) partitionIds -> partitionSet http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java: Line 629: * Gets hdfs partitions by partition id list. Update comment, remove @param Line 983: * Drops the partitions from HdfsTable. Cleans up its metadata from all the mappings Drops the given partitions from this table... Line 985: * Returns a list of partitions been dropped. Returns the list of partitions that were dropped. Line 1923: * @param partitionSet remove, we generally don't use the @param stuff Line 1939: // Get a list of HdfsPartition objects for the given partition ids. partition ids -> partition set please also fix elsewhere http://gerrit.cloudera.org:8080/#/c/3942/1/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java: Line 450: alterTableSetTblProperties(tbl, please condense these lengthy function calls to wrap around like this: function(a, b, c, d, e, f, g, h, i, j); Line 465: alterTableUpdateStats( condense wrapping Line 486: alterPartitionSetCached(tbl, wrapping Line 503: loadTableMetadata(tbl, wrapping Line 1784: * Drops existing partitions from the given table in Hive. If the partition is cached, If a partition is cached ... Line 1810: List<HdfsPartition> parts = catalog_.getHdfsPartitions(tbl, partitionSet); why not use tbl.getPartitionsFromPartitionSet() directly and get rid of Catalog? Line 1834: } catch (NoSuchObjectException e) { The error behavior is a somewhat strange now because a user can write ALTER TABLE t DROP PARTITION (year < 2010) and then have that statement fail because partition 1998 did not exist. By definition the user only wants to drop partitions that exist, so the new more general case seems different from the original fully-specified single partition. How does Hive behave with respect to IF EXISTS when non-equality predicate are used in a partition set? What do you think is the right behavior with respect to IF EXISTS? Line 1969: long numTargetPartitions = 0L; unused? Line 1978: numUpdatedPartitions.setRef(numTargetPartitions); can't you set to modifiedParts.size() directly? Line 2057: numUpdatedPartitions.setRef(numTargetPartitions); use modifiedParts.size() directly? Line 2246: ++ numTargetPartitions; extra space Line 2616: private void bulkDropPartitions(String dbName, String tableName, please remove this work-in-progress from the patch set, I'm still thinking about whether we need to switch to the bulk api immediately http://gerrit.cloudera.org:8080/#/c/3942/1/shell/impala_client.py File shell/impala_client.py: Line 450: excluded_query_types = ['use', 'drop'] Did you do the manual testing I suggested in the last round? -- To view, visit http://gerrit.cloudera.org:8080/3942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Amos Bird <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
