Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS ......................................................................
Patch Set 5: (24 comments) Looking good! Still have a few more test files to look at, but here's the bulk of my review. http://gerrit.cloudera.org:8080/#/c/3328/2/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: PS2, Line 539: Status AnalyticEvalNode::ProcessChildBatches(RuntimeState* state) { : // Consume child batches until eos or there are enough rows to return more than an : // output batch. Ensuring there is at least one more row left after returning results : // allows us to simplify the logic dealing with last_result_idx_ and result_tuples_. : while (!input_eos_ && NumOutputRowsReady() < state->batch_size() + 1) { : RETURN_IF_CANCELLED(state); : RETURN_IF_ERROR(child(0)->GetNext(state, curr_child_batch_.get(), &input_eos_)); : RETURN_IF_ERROR(QueryMaintenance(state)); : RETURN_IF_ERROR(ProcessChildBatch(state)); : // TODO: DCHECK that the size of result_tuples_ is bounded. It shouldn't be larger : // than 2x the batch size unless the end bound has an offset preceding, in which : // case it may be slightly larger (proportional to the offset but still bounded). : prev_child_batch_->Reset(); : prev_child_batch_.swap(curr_child_batch_); : } : if (input_eos_) { : cur I love this red http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS5, Line 1371: // TODO: Rename to something like 'SetVal' so its more natural to use with other UDAs. would you mind doing this? PS5, Line 1454: src Call with T::null() to be explicit, maybe use the same comment you have below as well. PS5, Line 1470: We c remove We PS5, Line 1471: for us remove for us PS5, Line 1515: state->last_val.ptr only do this if last_val is not null PS5, Line 1547: we encounter. encountered. Note this is different from FirstValUpdate() which has to check num_updates() because the first value may be NULL and we don't want to skip it in that case. PS5, Line 1555: we encounter encountered http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: PS5, Line 251: Requires a start bound of UNBOUNDED PRECEDING. I think you can leave this out here, it's not particularly actionable to people reading this header. It'd be useful if we could find a good way to add a DCHECK to verify this though, but I'm not sure there's a great place to put that. Maybe just put this in the implementation of Update(). PS5, Line 254: Requires a start bound of UNBOUNDED PRECEDING. same http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java: PS5, Line 553: function ... in an equivalent execution. Only applies to analytic functions with a window, i.e. ranking fns and offset fns are not supported. Line 555: private void reverse(Analyzer analyzer, FunctionName analyticFnName) { DCHECK that window_ is not null. PS5, Line 555: FunctionName analyticFnName Does this need to be a parameter or can you just get the FunctionName from fnCall_ ? PS5, Line 555: reverse Maybe rename to reverseWindowFn to better indicate this only applies to functions with a window_ Line 570: fnCall_.analyzeNoThrow(analyzer); // Use getType() instead if getReturnType() because wildcard decimals // have only been resolved in the former. type_ = fnCall_.getType(); PS5, Line 589: '_ignore_nulls" inconsistent quote/double quotes PS5, Line 590: so that the BE doesn't have to know about : * IGNORE NULLS because the BE implements them as different functions. PS5, Line 661: // Use getType() instead of getReturnType() because wildcard decimals : // have only been resolved in the former. : // TODO: Is this necessary? : Preconditions.checkState(type_.equals(fnCall_.getType())); Can you put this in reverse() after we analyze the new fnCall_ ? I think it'd make more sense there. Also the TODO can be removed. http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java: PS5, Line 468: IGNORE NULLS may only be used with analytic functions. Can you just say this function doesn't support IGNORE NULLS rather than saying it can only be used with A.F. (Most AF don't support it either.) It'd be good to have the same text as the error in AnalyticExpr. http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java File fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java: PS5, Line 967: createAnalyticBuiltin can you call the overload which has the bool parameter at the end and set it to false? that makes the function hidden from "show functions". http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/planner/AnalyticPlanner.java File fe/src/main/java/com/cloudera/impala/planner/AnalyticPlanner.java: PS5, Line 498: : : : : : : : : : : : : : : : : nice! http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java File fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java: Line 57: runPlannerTestFile("analytic-fns-ignore-nulls"); Any reason these are separate files/tests? If not, can you merge them to reduce the number of files we have floating around. Same for the query tests as well. http://gerrit.cloudera.org:8080/#/c/3328/5/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test: PS5, Line 743: ASC NULLS LAST I don't think this is necessarily an issue, but it's weird to me that now this is explicit. It shouldn't matter since NULLS LAST is the default for ASC, but it may mean something changed that we didn't realize when we flipped the orderby exprs. Let's see if Marcel thinks this is a concern. PS5, Line 1545: 07:ANALYTIC : | functions: first_value(bigint_col) : | partition by: tinyint_col : | order by: id ASC : | window: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING : | : 06:ANALYTIC : | functions: first_value(bigint_col) : | partition by: tinyint_col : | order by: id ASC : | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW : | : 05:ANALYTIC : | functions: first_value(int_col) : | partition by: tinyint_col : | order by: id ASC, bool_col DESC One of the simplifications we made does result in less grouping than we had before, but I think we can still make the argument it is worth it. We'll see what Marcel thinks. -- To view, visit http://gerrit.cloudera.org:8080/3328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
