[
https://issues.apache.org/jira/browse/IMPALA-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876856#comment-17876856
]
Michael Smith commented on IMPALA-13185:
----------------------------------------
A simple example of this problem from TPC-DS:
{code:java}
with ssales as (
select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'peach' group by ss_store_sk
order by ss_store_sk;
with ssales as (
select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'saddle' group by ss_store_sk
order by ss_store_sk; {code}
The {{where}} conjunct differs between the two calls ("peach" vs "saddle"). In
the first call, the resulting list of {{i_item_sk}} selected by
{{i_color=peach}} is delivered and only the rows matching it our cached. In the
2nd query, we re-use the scan result from the 1st query, but the rows are wrong
because they all correspond to "peach", not "saddle"; so the query returns 0
rows.
> Tuple cache keys need to incorporate runtime filter information
> ---------------------------------------------------------------
>
> Key: IMPALA-13185
> URL: https://issues.apache.org/jira/browse/IMPALA-13185
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Michael Smith
> Priority: Major
>
> If a runtime filter impacts the results of a fragment, then the tuple cache
> key needs to incorporate information about the generation of that runtime
> filter. This needs to include information about the base tables that impact
> the runtime filter.
> For example, suppose there is a join. The build side of the join produces a
> runtime filter that gets delivered to the probe side of the join. The tuple
> cache key for the probe side of the join will need to include a
> representation of the runtime filter. If the table on the build side of the
> join changes, the tuple cache key for the probe side needs to change due to
> the possible difference in the runtime filter.
> This can also impact eligibility. In theory, the build side of a join could
> be constructed from a source with a limit specified, and this can result in
> non-determinism. Since the build of the runtime filter is not deterministic,
> the consumer of the runtime filter is not deterministic and can't participate
> in tuple caching.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]