Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3210: last/first_value() support for IGNORE NULLS ......................................................................
Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/3328/7/fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java File fe/src/main/java/com/cloudera/impala/analysis/AnalyticExpr.java: Line 592: * 4. UNBOUNDED FOLLOWING windows: > combine 3 and 4, it looks like the same transformation is applied Done Line 661: // 3. > see above Done http://gerrit.cloudera.org:8080/#/c/3328/7/fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java File fe/src/main/java/com/cloudera/impala/catalog/BuiltinsDb.java: Line 491: private static final Map<Type, String> LAST_VALUE_IGNORE_NULLS_INIT_SYMBOL = > instead of registering a new (wrapper) function, why not simply have the ex We wanted to avoid adding anything to the last_value implementation that would slow down the common case of not-'ignore nulls', such as having to check a boolean flag on every call to Add, Remove, and Get on every row of the table, and we also wanted to avoid putting any special cases in the backend for this. I don't mind changing it if you think that's a cleaner design, though. Line 979: prefix + UPDATE_VAL_SYMBOL.get(t), > is update_val_symbol used anywhere else? No. UpdateVal is the same function as LastValUpdate. We renamed it because the implementation is being reused in the backend, and it makes it clearer what's going on there. http://gerrit.cloudera.org:8080/#/c/3328/7/testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test File testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test: Line 1555: | window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > why is this one not coalesced w/ the parent analytic node? In the input query, none of the functions have the same window and ordering. Previously, we were able to coalesce two of the first_value calls because we were rewriting first_value(unbounded preceding to unbounded following) to first_value(unbounded preceding to current row), which are equivalent in the non-'ignore nulls' case. They are not equivalent in the 'ignore nulls' case, though (eg. it may be the case that every value up to and including current row is NULL but some following value is non-null such that the correct value to return for the window (ub preceding to ub following) wouldn't even be in the window (ub preceding to current row). We thought that it was preferable to keep the code simple by doing the same standardization in the 'ignore nulls' and non-'ignore nulls' cases, but certainly I could add this standardization back in just for the non-'ignore nulls' case if we think its important. However, I actually ran a performance test with this query (modified to run on tpch since functional.alltypesagg is small) and the new version is actually slightly faster, because it performs 2 sorts instead of 3, and the sorts are more expensive than the analytic nodes, so its hard to say what the performance effects of these changes will be. +-----------------+-------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +-----------------+-------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | TARGETED-PERF() | PLANNER_TEST_PERF | parquet / none / none | 40.03 | 42.24 | -5.25% | 0.14% | 1.40% | 1 | 3 | +-----------------+-------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ Line 2187: # Start LAST_VALUE IGNORE NULLS > is there a reason this needs new test cases (or could this be incorporated Done Line 2190: last_value(tinyint_col ignore nulls) over (order by id rows between unbounded preceding and 1 preceding) > long line (and elsewhere) Done -- To view, visit http://gerrit.cloudera.org:8080/3328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic27525e2237fb54318549d2674f1610884208e9b Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
