[
https://issues.apache.org/jira/browse/BEAM-12647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385180#comment-17385180
]
Kyle Weaver commented on BEAM-12647:
------------------------------------
It seems no results are being produced because aggregate functions are
implemented by using KV pairs. All values are assigned a key, then a GBK is
followed by
[Combine.GroupedValues|https://beam.apache.org/releases/javadoc/2.31.0/org/apache/beam/sdk/transforms/Combine.GroupedValues.html]
Beam always treats key/values as a pair. There can be values without keys, but
there's no concept of a key with no values. So Combine.GroupedValues on an
empty PCollection correctly returns nothing.
The question is, why do we always use Combine.GroupedValues even when we aren't
using GROUP BY? Currently, if GROUP BY is omitted, we assign a key (K = the
empty Row) to all elements. Which if I understand correctly would also be bad
for performance. So I think the fix here is to use Combine.Globally when
there's no GROUP BY.
> Aggregations on empty pcoll don't return a value.
> -------------------------------------------------
>
> Key: BEAM-12647
> URL: https://issues.apache.org/jira/browse/BEAM-12647
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql, dsl-sql-zetasql
> Reporter: Kyle Weaver
> Priority: P3
>
> {{For example, "SELECT COUNT(\*) FROM table_empty" should return 0, but
> instead it returns no value.}}
> cc [~benglez] [~apilloud]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)