[ 
https://issues.apache.org/jira/browse/IMPALA-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6179:
-------------------------------------

    Assignee:     (was: Pooja Nilangekar)

> Constant argument to UDAF not accessible in merge phase of distributed 
> execution
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-6179
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6179
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>    Affects Versions: Impala 2.10.0
>            Reporter: Alexander Behm
>            Priority: Major
>         Attachments: 0001-Add-ConstTest-agg-fn.patch
>
>
> While adding a new built-in aggregation function I noticed that constant 
> arguments are not accessible in the merge phase of the aggregation, i.e.,  in 
> Init(), Merge() or Finalize() of the merge phase.
> In this example the 2nd constant argument is only accessible in the pre-agg 
> phase, but not in the merge agg phase.
> {code}
> [localhost:21000] > explain select const_test(bigint_col, 0.1) from 
> functional.alltypes;
> Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
> +----------------------------------------------+
> | Explain String                               |
> +----------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B |
> | Per-Host Resource Estimates: Memory=148.00MB |
> | Codegen disabled by planner                  |
> |                                              |
> | PLAN-ROOT SINK                               |
> | |                                            |
> | 03:AGGREGATE [FINALIZE]                      |
> | |  output: const_test:merge(bigint_col, 0.1) |
> | |                                            |
> | 02:EXCHANGE [UNPARTITIONED]                  |
> | |                                            |
> | 01:AGGREGATE                                 |
> | |  output: const_test(bigint_col, 0.1)       |
> | |                                            |
> | 00:SCAN HDFS [functional.alltypes]           |
> |    partitions=24/24 files=24 size=478.45KB   |
> +----------------------------------------------+
> {code}
> With num_nodes=1 the constant argument is accessible in the Finalize() of the 
> single aggregation phase:
> {code}
> [localhost:21000] > explain select const_test(bigint_col, 0.1) from 
> functional.alltypes;
> Query: explain select const_test(bigint_col, 0.1) from functional.alltypes
> +----------------------------------------------+
> | Explain String                               |
> +----------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=0B |
> | Per-Host Resource Estimates: Memory=138.00MB |
> | Codegen disabled by planner                  |
> |                                              |
> | PLAN-ROOT SINK                               |
> | |                                            |
> | 01:AGGREGATE [FINALIZE]                      |
> | |  output: const_test(bigint_col, 0.1)       |
> | |                                            |
> | 00:SCAN HDFS [functional.alltypes]           |
> |    partitions=24/24 files=24 size=478.45KB   |
> +----------------------------------------------+
> {code}
> I've attached a patch produced with git format-patch for to repro the above.
> It's not clear whether this behavior is a bug or intended. I can see 
> arguments both ways:
> * An aggregation function can take any number of arguments.
> * The Update() phase produces a single output slot.
> * The Merge() phase consumes the single slot produced by the Update() phase 
> and produces another output slot.
> * It would be quite convenient to have access to constant arguments of the 
> original SQL invocation in all phases of the aggregation.
> * However, this seems semantically at odds with non-constant arguments. For 
> non-constant arguments one would expect the Update() to aggregate/store 
> whatever state is needed for Merge() in the single output slot. So why should 
> that be different for constant arguments?
> * How would the planner decide which arguments to forward to the Merge() 
> phase? What would the BE aggregation function signatures look like? Today, 
> the Merge() phase always takes a single SlotRef as input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to