[
https://issues.apache.org/jira/browse/IMPALA-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong reassigned IMPALA-6179:
-------------------------------------
Assignee: (was: Pooja Nilangekar)
> Constant argument to UDAF not accessible in merge phase of distributed
> execution
> --------------------------------------------------------------------------------
>
> Key: IMPALA-6179
> URL: https://issues.apache.org/jira/browse/IMPALA-6179
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Affects Versions: Impala 2.10.0
> Reporter: Alexander Behm
> Priority: Major
> Attachments: 0001-Add-ConstTest-agg-fn.patch
>
>
> While adding a new built-in aggregation function I noticed that constant
> arguments are not accessible in the merge phase of the aggregation, i.e., in
> Init(), Merge() or Finalize() of the merge phase.
> In this example the 2nd constant argument is only accessible in the pre-agg
> phase, but not in the merge agg phase.
> {code}
> [localhost:21000] > explain select const_test(bigint_col, 0.1) from
> functional.alltypes;
> Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
> +----------------------------------------------+
> | Explain String |
> +----------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B |
> | Per-Host Resource Estimates: Memory=148.00MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 03:AGGREGATE [FINALIZE] |
> | | output: const_test:merge(bigint_col, 0.1) |
> | | |
> | 02:EXCHANGE [UNPARTITIONED] |
> | | |
> | 01:AGGREGATE |
> | | output: const_test(bigint_col, 0.1) |
> | | |
> | 00:SCAN HDFS [functional.alltypes] |
> | partitions=24/24 files=24 size=478.45KB |
> +----------------------------------------------+
> {code}
> With num_nodes=1 the constant argument is accessible in the Finalize() of the
> single aggregation phase:
> {code}
> [localhost:21000] > explain select const_test(bigint_col, 0.1) from
> functional.alltypes;
> Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
> +----------------------------------------------+
> | Explain String |
> +----------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B |
> | Per-Host Resource Estimates: Memory=138.00MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 01:AGGREGATE [FINALIZE] |
> | | output: const_test(bigint_col, 0.1) |
> | | |
> | 00:SCAN HDFS [functional.alltypes] |
> | partitions=24/24 files=24 size=478.45KB |
> +----------------------------------------------+
> {code}
> I've attached a patch produced with git format-patch for to repro the above.
> It's not clear whether this behavior is a bug or intended. I can see
> arguments both ways:
> * An aggregation function can take any number of arguments.
> * The Update() phase produces a single output slot.
> * The Merge() phase consumes the single slot produced by the Update() phase
> and produces another output slot.
> * It would be quite convenient to have access to constant arguments of the
> original SQL invocation in all phases of the aggregation.
> * However, this seems semantically at odds with non-constant arguments. For
> non-constant arguments one would expect the Update() to aggregate/store
> whatever state is needed for Merge() in the single output slot. So why should
> that be different for constant arguments?
> * How would the planner decide which arguments to forward to the Merge()
> phase? What would the BE aggregation function signatures look like? Today,
> the Merge() phase always takes a single SlotRef as input.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]