Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS
......................................................................


Patch Set 5:

(24 comments)

Looking good! Still have a few more test files to look at, but here's the bulk 
of my review.

http://gerrit.cloudera.org:8080/#/c/3328/2/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

PS2, Line 539: Status AnalyticEvalNode::ProcessChildBatches(RuntimeState* 
state) {
             :   // Consume child batches until eos or there are enough rows to 
return more than an
             :   // output batch. Ensuring there is at least one more row left 
after returning results
             :   // allows us to simplify the logic dealing with 
last_result_idx_ and result_tuples_.
             :   while (!input_eos_ && NumOutputRowsReady() < 
state->batch_size() + 1) {
             :     RETURN_IF_CANCELLED(state);
             :     RETURN_IF_ERROR(child(0)->GetNext(state, 
curr_child_batch_.get(), &input_eos_));
             :     RETURN_IF_ERROR(QueryMaintenance(state));
             :     RETURN_IF_ERROR(ProcessChildBatch(state));
             :     // TODO: DCHECK that the size of result_tuples_ is bounded. 
It shouldn't be larger
             :     // than 2x the batch size unless the end bound has an offset 
preceding, in which
             :     // case it may be slightly larger (proportional to the 
offset but still bounded).
             :     prev_child_batch_->Reset();
             :     prev_child_batch_.swap(curr_child_batch_);
             :   }
             :   if (input_eos_) {
             :     cur
I love this red


http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS5, Line 1371: // TODO: Rename to something like 'SetVal' so its more natural 
to use with other UDAs.
would you mind doing this?


PS5, Line 1454: src
Call with T::null() to be explicit, maybe use the same comment you have below 
as well.


PS5, Line 1470: We c
remove We


PS5, Line 1471:  for us
remove for us


PS5, Line 1515: state->last_val.ptr
only do this if last_val is not null


PS5, Line 1547:  we encounter.
encountered.

Note this is different from FirstValUpdate() which has to check num_updates() 
because the first value may be NULL and we don't want to skip it in that case.


PS5, Line 1555: we encounter
encountered


http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

PS5, Line 251: Requires a start bound of UNBOUNDED PRECEDING.
I think you can leave this out here, it's not particularly actionable to people 
reading this header. It'd be useful if we could find a good way to add a DCHECK 
to verify this though, but I'm not sure there's a great place to put that. 
Maybe just put this in the implementation of Update().


PS5, Line 254:  Requires a start bound of UNBOUNDED PRECEDING.
same


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java:

PS5, Line 553: function
... in an equivalent execution. Only applies to analytic functions with a 
window, i.e. ranking fns and offset fns are not supported.


Line 555:   private void reverse(Analyzer analyzer, FunctionName 
analyticFnName) {
DCHECK that window_ is not null.


PS5, Line 555: FunctionName analyticFnName
Does this need to be a parameter or can you just get the FunctionName from 
fnCall_ ?


PS5, Line 555: reverse
Maybe rename to reverseWindowFn to better indicate this only applies to 
functions with a window_


Line 570:       fnCall_.analyzeNoThrow(analyzer);
// Use getType() instead if getReturnType() because wildcard decimals
      // have only been resolved in the former.
      type_ = fnCall_.getType();


PS5, Line 589: '_ignore_nulls"
inconsistent quote/double quotes


PS5, Line 590: so that the BE doesn't have to know about
             :    *    IGNORE NULLS
because the BE implements them as different functions.


PS5, Line 661:       // Use getType() instead of getReturnType() because 
wildcard decimals
             :       // have only been resolved in the former.
             :       // TODO: Is this necessary?
             :       Preconditions.checkState(type_.equals(fnCall_.getType()));
Can you put this in reverse() after we analyze the new fnCall_ ? I think it'd 
make more sense there. Also the TODO can be removed.


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java:

PS5, Line 468: IGNORE NULLS may only be used with analytic functions.
Can you just say this function doesn't support IGNORE NULLS rather than saying 
it can only be used with A.F.  (Most AF don't support it either.) It'd be good 
to have the same text as the error in AnalyticExpr.


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java
File fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java:

PS5, Line 967: createAnalyticBuiltin
can you call the overload which has the bool parameter at the end and set it to 
false? that makes the function hidden from "show functions".


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/planner/AnalyticPlanner.java
File fe/src/main/java/com/cloudera/impala/planner/AnalyticPlanner.java:

PS5, Line 498: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
nice!


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java:

Line 57:     runPlannerTestFile("analytic-fns-ignore-nulls");
Any reason these are separate files/tests? If not, can you merge them to reduce 
the number of files we have floating around. Same for the query tests as well.


http://gerrit.cloudera.org:8080/#/c/3328/5/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

PS5, Line 743: ASC NULLS LAST
I don't think this is necessarily an issue, but it's weird to me that now this 
is explicit. It shouldn't matter since NULLS LAST is the default for ASC, but 
it may mean something changed that we didn't realize when we flipped the 
orderby exprs.

Let's see if Marcel thinks this is a concern.


PS5, Line 1545: 07:ANALYTIC
              : |  functions: first_value(bigint_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC
              : |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING
              : |
              : 06:ANALYTIC
              : |  functions: first_value(bigint_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC
              : |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
              : |
              : 05:ANALYTIC
              : |  functions: first_value(int_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC, bool_col DESC
One of the simplifications we made does result in less grouping than we had 
before, but I think we can still make the argument it is worth it. We'll see 
what Marcel thinks.


-- 
To view, visit http://gerrit.cloudera.org:8080/3328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to