Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/3328/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS2, Line 1418: LastValIgnoreNullsState
> An overview of how this all works would be helpful to other readers. A comm
Done


PS2, Line 1429: state
> need to initialize num_nulls to 0 as well
Done


PS2, Line 1445: state->last_val
> I think we need to handle strings specially. If this wasn't null before and
Done


PS2, Line 1460: state->last_val = T::null()
> same deal about releasing memory. You could call LVU(ctx, T::null(), &state
Done


PS2, Line 1466: if (UNLIKELY(src.is_null)) return T::null();
> Hm, why is this necessary? I can't think of a case where the intermediate s
We're inconsistent about this - we (generally) have either a DCHECK or no check 
at all in the GetValue functions in this file but an 'if' check like this in 
the Finalize functions.

I agree though that it seems like this should never happen, and the tests pass 
with a DCHECK, so of course a DCHECK is better.


PS2, Line 1476:   if (UNLIKELY(src.is_null)) return StringVal::null();
> same here and in the Finalize methods
Done


PS2, Line 1491: if (UNLIKELY(src.is_null)) return T::null();
              :   DCHECK_EQ(sizeof(LastValIgnoreNullsState<T>), src.len);
              :   LastValIgnoreNullsState<T>* state =
              :       reinterpret_cast<LastValIgnoreNullsState<T>*>(src.ptr);
              :   T result = state->last_val;
> I think you can replace this with:
Done


PS2, Line 1508:   StringVal result;
              :   if (state->last_val.is_null) {
              :     result = StringVal::null();
              :   } else {
              :     result = StringVal::CopyFrom(ctx, state->last_val.ptr, 
state->last_val.len);
              :   }
> You can replace this with a call to LastValIgnoreNullsGetValue(ctx, src); l
Done


PS2, Line 1515: ctx->Free
> I think you also need to free state->last_val.ptr
Done


http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 2423: function_name:fn_name LPAREN function_params:params KW_IGNORE 
KW_NULLS RPAREN
> Let's see whatAlex and/or Marcel thinks, but perhaps we remove this branch 
Done


http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java:

PS2, Line 579:    * 3. FIRST_VALUE without UNBOUNDED PRECEDING gets rewritten 
to use a different window
             :    *    and change the function to return the last value. We 
either set the fn to be
             :    *    'last_value' or 'first_value_rewrite', which simply 
wraps the 'last_value'
             :    *    implementation but allows us to handle the first rows in 
a partition in a special
             :    *    way in the backend. There are a few cases:
             :    *     a) IGNORE NULLS:
             :    *        We cannot change the values that appear in the 
window since there may be any
             :    *        number of NULLs that must be ignored, so we can't 
take advantage of the
             :    *        performance optimizations in the other cases. Just 
reverse the window and
             :    *        ordering and use 'last_value'.
             :    *     b) Start bound is X FOLLOWING or CURRENT ROW (X=0):
             :    *        Use 'last_value' with a window where both bounds are 
X FOLLOWING (or
             :    *        CURRENT ROW). Setting the start bound to X following 
is necessary because the
             :    *        X rows at the end of a partition have no rows in 
their window. Note that X
             :    *        FOLLOWING could be rewritten as lead(X) but that 
would not work for CURRENT
             :    *        ROW.
             :    *     c) Start bound is X PRECEDING and end bound is CURRENT 
ROW or FOLLOWING:
             :    *        Use 'first_value_rewrite' and a window with an end 
bound X PRECEDING. An
             :    *        extra parameter '-1' is added to indicate to the 
backend that NULLs should
             :    *        not be added for the first X rows.
             :    *     d) Start bound is X PRECEDING and end bound is Y 
PRECEDING:
             :    *        Use 'first_value_rewrite' and a window with an end 
bound X PRECEDING. The
             :    *        first Y rows in a partition have empty windows and 
should be NULL. An extra
             :    *        parameter with the integer constant Y is added to 
indicate to the backend
             :    *        that NULLs should be added for the first Y rows.
> This is too complicated (and was to begin with, so not your fault). However
Done


http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/java/com/cloudera/impala/analysis/IgnoreNullsExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/IgnoreNullsExpr.java:

> I know I led you down this path, and it seems like the fastest way to get t
Done


PS2, Line 30: IgnoreNullsExpr
> Hm the name should probably indicate it's a kind of FunctionCall, e.g. Func
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to