Alex Behm has posted comments on this change.

Change subject: IMPALA-1654: Support general predicates in most partition DDL 
operations.
......................................................................


Patch Set 9:

(8 comments)

Agree with all points except 1. Let's not start changing the roles/separation 
between the components of the system. I'm not saying we should never do that, 
but let's not do that in this patch. See some reasons in my response in 
Frontend.thrift.

http://gerrit.cloudera.org:8080/#/c/1563/9/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 206: struct TShowFilesParams {
> I think the best way to avoid inconsistency is doing partition pruning in C
That's an interesting idea, but I don't think we should follow it.

We try to maintain separation between the catalog and impalads. That change 
would require us to do expression analysis and even expression evaluation (in 
native code) on the catalog server.

I think we'd need to do the partition pruning while holding the global catalog 
lock, otherwise someone could still run a concurrent invalidate metadata and 
change the partition ids. Partition pruning can take a significant amount of 
time and resources, so the extra time and load (while holding a lock) could 
become an issue.


http://gerrit.cloudera.org:8080/#/c/1563/9/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java:

Line 435:         Preconditions.checkState(
> this ACL thing is a little obscure for me. I may need some time to figure i
That's understandable :). Please reach out if I can assist.


http://gerrit.cloudera.org:8080/#/c/1563/9/fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java
File fe/src/main/java/com/cloudera/impala/analysis/PartitionSpec.java:

Line 130:     if (isKV_) {
> This was discussed in Patchset 1. You mentioned "Ideally, we'd only have a 
Sorry for the confusion. I guess I needed to see the resulting code before 
understanding. The idea was that, if sensible, we should unify all this 
partition logic. But the logic for ADD PARTITION and LOAD DATA are so different 
that it makes more sense to split them up.


Line 157:         }
Shouldn't we have an else that throws here?


Line 229:     // while "SlotRef <op> SlotRef" and "Literal <op> Literal" don't.
> This was also about indicating user errors too. I think using more than one
I'm not sure I agree. Suppose a user has a table partitioned by year, month, 
day and wants to select all partitions that fall within the last 6 months 
before now() (note that the timeframe can span two years). A reasonable 
approach could be to combine the year+month+day into a timestamp and then use 
timestamp functions to do the date computation. That expression would mention 
all partition columns.
I think it would be difficult to express that set of partitions with the 
current (well-intended) limitations.

The "Literal <op> Literal" is dangerous because it selects either no or all 
partitions - not an interesting use case.


Line 247:         if (bp.getOp().equals(EQ)) {
> I don't think it is necessary here. Because what we are trying to do is bac
I'd agree with you if it meant significantly complicating the code, but it's 
pretty easy to do. Just call Expr.foldConstantChildren() first. Otherwise, we 
have subtle baked-in inconsistencies in behavior, for example:

p = NULL
will behave different from
p = NULL + 10

I agree it's a strange use case, but most queries are not written by humans.


http://gerrit.cloudera.org:8080/#/c/1563/9/shell/impala_client.py
File shell/impala_client.py:

Line 444:     excluded_query_types = ['use', 'drop']
> which tests should I run?
I think it's sufficient if you try running a few of both types of few ALTER 
TABLE statements through the shell (those stmt that do and do not return a 
result set), and report your findings.


http://gerrit.cloudera.org:8080/#/c/1563/9/testdata/workloads/functional-query/queries/QueryTest/partition-ddl-predicates.test
File 
testdata/workloads/functional-query/queries/QueryTest/partition-ddl-predicates.test:

Line 42: show files in p1 partition (j<2, k="a")
> It shows you files in the returned partition set.
But isn't (j<2 and k="a") a single compound predicate that references multiple 
partition columns? What I'm trying to say is that I think we need to split up 
each partition expr into its conjuncts and then start the partition pruning 
process. Consider another example,

show files in t partition (j<2 and k="", j IS NULL and k IS NULL)

the partition spec will get two expressions that are compound AND predicates

You can use Expr.getConjuncts()


-- 
To view, visit http://gerrit.cloudera.org:8080/1563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2c9162fcf9d227b8daf4c2e761d57bab4e26408f
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Amos Bird <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Amos Bird <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-HasComments: Yes

Reply via email to