Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS ......................................................................
Patch Set 5: (19 comments) http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS5, Line 1371: // TODO: Rename to something like 'SetVal' so its more natural to use with other UDAs. > would you mind doing this? Done PS5, Line 1454: src > Call with T::null() to be explicit, maybe use the same comment you have bel Done PS5, Line 1470: We c > remove We Done PS5, Line 1471: for us > remove for us Done PS5, Line 1515: state->last_val.ptr > only do this if last_val is not null Done PS5, Line 1547: we encounter. > encountered. Done PS5, Line 1555: we encounter > encountered Done http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions.h File be/src/exprs/aggregate-functions.h: PS5, Line 251: Requires a start bound of UNBOUNDED PRECEDING. > I think you can leave this out here, it's not particularly actionable to pe Unless I'm missing something, the functions don't have access to the window, so we couldn't put a DCHECK in Update(), unless we add the window to the FunctionContext. Or we could put a the DCHECK in AnalyticEvalNode::Init. At the very least, I think it should be documented somewhere, even if its not right here. http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java: PS5, Line 553: function > ... in an equivalent execution. Only applies to analytic functions with a w Done PS5, Line 555: reverse > Maybe rename to reverseWindowFn to better indicate this only applies to fun Done Line 570: fnCall_.analyzeNoThrow(analyzer); > // Use getType() instead if getReturnType() because wildcard decimals Done PS5, Line 589: '_ignore_nulls" > inconsistent quote/double quotes Done PS5, Line 590: so that the BE doesn't have to know about : * IGNORE NULLS > because the BE implements them as different functions. Done PS5, Line 661: // Use getType() instead of getReturnType() because wildcard decimals : // have only been resolved in the former. : // TODO: Is this necessary? : Preconditions.checkState(type_.equals(fnCall_.getType())); > Can you put this in reverse() after we analyze the new fnCall_ ? I think it Done http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java: PS5, Line 468: IGNORE NULLS may only be used with analytic functions. > Can you just say this function doesn't support IGNORE NULLS rather than say Done http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java File fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java: PS5, Line 967: createAnalyticBuiltin > can you call the overload which has the bool parameter at the end and set i Done http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java File fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java: Line 57: runPlannerTestFile("analytic-fns-ignore-nulls"); > Any reason these are separate files/tests? If not, can you merge them to re Done http://gerrit.cloudera.org:8080/#/c/3328/5/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test: PS5, Line 743: ASC NULLS LAST > I don't think this is necessarily an issue, but it's weird to me that now t The reason that this happens is that when OrderByElement.reverse creates new OrderByElements it always assigns either true or false to 'nullsFirst', even if the original OrderByElement's 'nullsFirst' was NULL. As you say, though, "NULLS LAST" is the default for ASC, and this plan is the same from the perspective of the backend as an equivalent plan where "NULLS LAST" wasn't explicit, so I don't think it should matter. PS5, Line 1545: 07:ANALYTIC : | functions: first_value(bigint_col) : | partition by: tinyint_col : | order by: id ASC : | window: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING : | : 06:ANALYTIC : | functions: first_value(bigint_col) : | partition by: tinyint_col : | order by: id ASC : | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW : | : 05:ANALYTIC : | functions: first_value(int_col) : | partition by: tinyint_col : | order by: id ASC, bool_col DESC > One of the simplifications we made does result in less grouping than we had Its true that we get less grouping in this particular situation, and that in general this patch will result in more types of windows being used since its less aggressive about standardizing things (assuming that all possible input windows are used evenly, which is probably not true). But, its not entirely clear that the changes that were made will result in less grouping overall, since first_value_rewrite couldn't be grouped with anything else (as enforced by AnalyticPlanner.requiresIndependentEval, which this patch removes). It really depends on what combinations of bounds are more or less common. -- To view, visit http://gerrit.cloudera.org:8080/3328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
