Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS ......................................................................
Patch Set 2: (14 comments) Thanks! Can you also try building with ASAN [1] and running the tests to check for memory corruption? I'd also recommend running some of the same queries you have but on something larger than functional tables, e.g. tpch.lineitem or so. http://gerrit.cloudera.org:8080/#/c/3328/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: Line 1428: reinterpret_cast<LastValIgnoreNullsState<T>*>(dst->ptr); please initialize num_nulls to 0 explicitly (even though AllocBuffer does a memset). http://gerrit.cloudera.org:8080/#/c/3328/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS2, Line 1418: LastValIgnoreNullsState An overview of how this all works would be helpful to other readers. A comment above this seems like a reasonable place to do so. PS2, Line 1429: state need to initialize num_nulls to 0 as well PS2, Line 1445: state->last_val I think we need to handle strings specially. If this wasn't null before and this is a string type, we won't free that memory.However, I think you can get away with just calling LVU with a null since it has a StringVal implementation that frees the memory properly. E.g. replace this with: LastValUpdate(ctx, T::null(), &state->last_val); PS2, Line 1460: state->last_val = T::null() same deal about releasing memory. You could call LVU(ctx, T::null(), &state->last_val); again, but just add a good comment because it'll be confusing for Remove to be calling update :) Or better yet, we could even rename the LastValUpdate mthds to just something like SetVal, since that's pretty much all it does. Feel free to just add a TODO about that. PS2, Line 1466: if (UNLIKELY(src.is_null)) return T::null(); Hm, why is this necessary? I can't think of a case where the intermediate state itself is null. We typically have a DCHECK != null here. PS2, Line 1476: if (UNLIKELY(src.is_null)) return StringVal::null(); same here and in the Finalize methods PS2, Line 1491: if (UNLIKELY(src.is_null)) return T::null(); : DCHECK_EQ(sizeof(LastValIgnoreNullsState<T>), src.len); : LastValIgnoreNullsState<T>* state = : reinterpret_cast<LastValIgnoreNullsState<T>*>(src.ptr); : T result = state->last_val; I think you can replace this with: T result = LastValIgnoreNullsGetValue(ctx, src); PS2, Line 1508: StringVal result; : if (state->last_val.is_null) { : result = StringVal::null(); : } else { : result = StringVal::CopyFrom(ctx, state->last_val.ptr, state->last_val.len); : } You can replace this with a call to LastValIgnoreNullsGetValue(ctx, src); like above PS2, Line 1515: ctx->Free I think you also need to free state->last_val.ptr http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 2423: function_name:fn_name LPAREN function_params:params KW_IGNORE KW_NULLS RPAREN Let's see whatAlex and/or Marcel thinks, but perhaps we remove this branch here, and instead create an analytic_fn_call_expr nonterminal which is just function_call_expr OR this branch. That might be nice to factor it out. Let's chat with them first though. http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java: PS2, Line 579: * 3. FIRST_VALUE without UNBOUNDED PRECEDING gets rewritten to use a different window : * and change the function to return the last value. We either set the fn to be : * 'last_value' or 'first_value_rewrite', which simply wraps the 'last_value' : * implementation but allows us to handle the first rows in a partition in a special : * way in the backend. There are a few cases: : * a) IGNORE NULLS: : * We cannot change the values that appear in the window since there may be any : * number of NULLs that must be ignored, so we can't take advantage of the : * performance optimizations in the other cases. Just reverse the window and : * ordering and use 'last_value'. : * b) Start bound is X FOLLOWING or CURRENT ROW (X=0): : * Use 'last_value' with a window where both bounds are X FOLLOWING (or : * CURRENT ROW). Setting the start bound to X following is necessary because the : * X rows at the end of a partition have no rows in their window. Note that X : * FOLLOWING could be rewritten as lead(X) but that would not work for CURRENT : * ROW. : * c) Start bound is X PRECEDING and end bound is CURRENT ROW or FOLLOWING: : * Use 'first_value_rewrite' and a window with an end bound X PRECEDING. An : * extra parameter '-1' is added to indicate to the backend that NULLs should : * not be added for the first X rows. : * d) Start bound is X PRECEDING and end bound is Y PRECEDING: : * Use 'first_value_rewrite' and a window with an end bound X PRECEDING. The : * first Y rows in a partition have empty windows and should be NULL. An extra : * parameter with the integer constant Y is added to indicate to the backend : * that NULLs should be added for the first Y rows. This is too complicated (and was to begin with, so not your fault). However, I still think we can just simplify this now by rewriting to LV (w/ reversed ordering/window). I think you had a concern but I'm pretty sure it works and greatly simplifies this code. There could be a small perf impact, but I can't imagine it's not worth it, and the analytic eval isn't even close to optimized yet anyway. I tried it out and tested it, seems to work. Here's my delta: http://github.mtv.cloudera.com/mj/Impala/commit/bd4381c2e387a36dd5bc215bc7cab2e92951fb2c http://gerrit.cloudera.org:8080/#/c/3328/2/fe/src/main/java/com/cloudera/impala/analysis/IgnoreNullsExpr.java File fe/src/main/java/com/cloudera/impala/analysis/IgnoreNullsExpr.java: I know I led you down this path, and it seems like the fastest way to get this up and running, but I think we should chat with Marcel/Alex about potentially other options for modeling this (e.g. another option could be having an additional flag on FunctionParams like we do for DISTINCT, though that will be ugly too). PS2, Line 30: IgnoreNullsExpr Hm the name should probably indicate it's a kind of FunctionCall, e.g. FunctionCallIgnoreNullsExpr -- To view, visit http://gerrit.cloudera.org:8080/3328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
