Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS
......................................................................


Patch Set 5:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS5, Line 1371: // TODO: Rename to something like 'SetVal' so its more natural 
to use with other UDAs.
> would you mind doing this?
Done


PS5, Line 1454: src
> Call with T::null() to be explicit, maybe use the same comment you have bel
Done


PS5, Line 1470: We c
> remove We
Done


PS5, Line 1471:  for us
> remove for us
Done


PS5, Line 1515: state->last_val.ptr
> only do this if last_val is not null
Done


PS5, Line 1547:  we encounter.
> encountered.
Done


PS5, Line 1555: we encounter
> encountered
Done


http://gerrit.cloudera.org:8080/#/c/3328/5/be/src/exprs/aggregate-functions.h
File be/src/exprs/aggregate-functions.h:

PS5, Line 251: Requires a start bound of UNBOUNDED PRECEDING.
> I think you can leave this out here, it's not particularly actionable to pe
Unless I'm missing something, the functions don't have access to the window, so 
we couldn't put a DCHECK in Update(), unless we add the window to the 
FunctionContext. Or we could put a the DCHECK in AnalyticEvalNode::Init. 
At the very least, I think it should be documented somewhere, even if its not 
right here.


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java:

PS5, Line 553: function
> ... in an equivalent execution. Only applies to analytic functions with a w
Done


PS5, Line 555: reverse
> Maybe rename to reverseWindowFn to better indicate this only applies to fun
Done


Line 570:       fnCall_.analyzeNoThrow(analyzer);
> // Use getType() instead if getReturnType() because wildcard decimals
Done


PS5, Line 589: '_ignore_nulls"
> inconsistent quote/double quotes
Done


PS5, Line 590: so that the BE doesn't have to know about
             :    *    IGNORE NULLS
> because the BE implements them as different functions.
Done


PS5, Line 661:       // Use getType() instead of getReturnType() because 
wildcard decimals
             :       // have only been resolved in the former.
             :       // TODO: Is this necessary?
             :       Preconditions.checkState(type_.equals(fnCall_.getType()));
> Can you put this in reverse() after we analyze the new fnCall_ ? I think it
Done


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/com/cloudera/impala/analysis/FunctionCallExpr.java:

PS5, Line 468: IGNORE NULLS may only be used with analytic functions.
> Can you just say this function doesn't support IGNORE NULLS rather than say
Done


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java
File fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java:

PS5, Line 967: createAnalyticBuiltin
> can you call the overload which has the bool parameter at the end and set i
Done


http://gerrit.cloudera.org:8080/#/c/3328/5/fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java
File fe/src/test/java/com/cloudera/impala/planner/PlannerTest.java:

Line 57:     runPlannerTestFile("analytic-fns-ignore-nulls");
> Any reason these are separate files/tests? If not, can you merge them to re
Done


http://gerrit.cloudera.org:8080/#/c/3328/5/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test:

PS5, Line 743: ASC NULLS LAST
> I don't think this is necessarily an issue, but it's weird to me that now t
The reason that this happens is that when OrderByElement.reverse creates new 
OrderByElements it always assigns either true or false to 'nullsFirst', even if 
the original OrderByElement's 'nullsFirst' was NULL.

As you say, though, "NULLS LAST" is the default for ASC, and this plan is the 
same from the perspective of the backend as an equivalent plan where "NULLS 
LAST" wasn't explicit, so I don't think it should matter.


PS5, Line 1545: 07:ANALYTIC
              : |  functions: first_value(bigint_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC
              : |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING
              : |
              : 06:ANALYTIC
              : |  functions: first_value(bigint_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC
              : |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
              : |
              : 05:ANALYTIC
              : |  functions: first_value(int_col)
              : |  partition by: tinyint_col
              : |  order by: id ASC, bool_col DESC
> One of the simplifications we made does result in less grouping than we had
Its true that we get less grouping in this particular situation, and that in 
general this patch will result in more types of windows being used since its 
less aggressive about standardizing things (assuming that all possible input 
windows are used evenly, which is probably not true).

But, its not entirely clear that the changes that were made will result in less 
grouping overall, since first_value_rewrite couldn't be grouped with anything 
else (as enforced by AnalyticPlanner.requiresIndependentEval, which this patch 
removes).

It really depends on what combinations of bounds are more or less common.


-- 
To view, visit http://gerrit.cloudera.org:8080/3328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to