Alex Behm has posted comments on this change. Change subject: IMPALA-1654: Support general predicates in most partition DDL operations. ......................................................................
Patch Set 9: (8 comments) Agree with all points except 1. Let's not start changing the roles/separation between the components of the system. I'm not saying we should never do that, but let's not do that in this patch. See some reasons in my response in Frontend.thrift. http://gerrit.cloudera.org:8080/#/c/1563/9/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 206: struct TShowFilesParams { > I think the best way to avoid inconsistency is doing partition pruning in C That's an interesting idea, but I don't think we should follow it. We try to maintain separation between the catalog and impalads. That change would require us to do expression analysis and even expression evaluation (in native code) on the catalog server. I think we'd need to do the partition pruning while holding the global catalog lock, otherwise someone could still run a concurrent invalidate metadata and change the partition ids. Partition pruning can take a significant amount of time and resources, so the extra time and load (while holding a lock) could become an issue. http://gerrit.cloudera.org:8080/#/c/1563/9/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java: Line 435: Preconditions.checkState( > this ACL thing is a little obscure for me. I may need some time to figure i That's understandable :). Please reach out if I can assist. http://gerrit.cloudera.org:8080/#/c/1563/9/fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java: Line 130: if (isKV_) { > This was discussed in Patchset 1. You mentioned "Ideally, we'd only have a Sorry for the confusion. I guess I needed to see the resulting code before understanding. The idea was that, if sensible, we should unify all this partition logic. But the logic for ADD PARTITION and LOAD DATA are so different that it makes more sense to split them up. Line 157: } Shouldn't we have an else that throws here? Line 229: // while "SlotRef <op> SlotRef" and "Literal <op> Literal" don't. > This was also about indicating user errors too. I think using more than one I'm not sure I agree. Suppose a user has a table partitioned by year, month, day and wants to select all partitions that fall within the last 6 months before now() (note that the timeframe can span two years). A reasonable approach could be to combine the year+month+day into a timestamp and then use timestamp functions to do the date computation. That expression would mention all partition columns. I think it would be difficult to express that set of partitions with the current (well-intended) limitations. The "Literal <op> Literal" is dangerous because it selects either no or all partitions - not an interesting use case. Line 247: if (bp.getOp().equals(EQ)) { > I don't think it is necessary here. Because what we are trying to do is bac I'd agree with you if it meant significantly complicating the code, but it's pretty easy to do. Just call Expr.foldConstantChildren() first. Otherwise, we have subtle baked-in inconsistencies in behavior, for example: p = NULL will behave different from p = NULL + 10 I agree it's a strange use case, but most queries are not written by humans. http://gerrit.cloudera.org:8080/#/c/1563/9/shell/impala_client.py File shell/impala_client.py: Line 444: excluded_query_types = ['use', 'drop'] > which tests should I run? I think it's sufficient if you try running a few of both types of few ALTER TABLE statements through the shell (those stmt that do and do not return a result set), and report your findings. http://gerrit.cloudera.org:8080/#/c/1563/9/testdata/workloads/functional-query/queries/QueryTest/partition-ddl-predicates.test File testdata/workloads/functional-query/queries/QueryTest/partition-ddl-predicates.test: Line 42: show files in p1 partition (j<2, k="a") > It shows you files in the returned partition set. But isn't (j<2 and k="a") a single compound predicate that references multiple partition columns? What I'm trying to say is that I think we need to split up each partition expr into its conjuncts and then start the partition pruning process. Consider another example, show files in t partition (j<2 and k="", j IS NULL and k IS NULL) the partition spec will get two expressions that are compound AND predicates You can use Expr.getConjuncts() -- To view, visit http://gerrit.cloudera.org:8080/1563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Amos Bird <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Amos Bird <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-HasComments: Yes
