[ 
https://issues.apache.org/jira/browse/IMPALA-13185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876856#comment-17876856
 ] 

Michael Smith commented on IMPALA-13185:
----------------------------------------

A simple example of this problem from TPC-DS:
{code:java}
with ssales as (
  select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
    where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'peach' group by ss_store_sk 
order by ss_store_sk;

with ssales as (
  select sum(ss_net_paid) netpaid, ss_store_sk, i_color from store_sales, item
    where ss_item_sk = i_item_sk group by ss_store_sk, i_color)
select sum(netpaid) from ssales where i_color = 'saddle' group by ss_store_sk 
order by ss_store_sk; {code}
The {{where}} conjunct differs between the two calls ("peach" vs "saddle"). In 
the first call, the resulting list of {{i_item_sk}} selected by 
{{i_color=peach}} is delivered and only the rows matching it our cached. In the 
2nd query, we re-use the scan result from the 1st query, but the rows are wrong 
because they all correspond to "peach", not "saddle"; so the query returns 0 
rows.

> Tuple cache keys need to incorporate runtime filter information
> ---------------------------------------------------------------
>
>                 Key: IMPALA-13185
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13185
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Michael Smith
>            Priority: Major
>
> If a runtime filter impacts the results of a fragment, then the tuple cache 
> key needs to incorporate information about the generation of that runtime 
> filter. This needs to include information about the base tables that impact 
> the runtime filter.
> For example, suppose there is a join. The build side of the join produces a 
> runtime filter that gets delivered to the probe side of the join. The tuple 
> cache key for the probe side of the join will need to include a 
> representation of the runtime filter. If the table on the build side of the 
> join changes, the tuple cache key for the probe side needs to change due to 
> the possible difference in the runtime filter.
> This can also impact eligibility. In theory, the build side of a join could 
> be constructed from a source with a limit specified, and this can result in 
> non-determinism. Since the build of the runtime filter is not deterministic, 
> the consumer of the runtime filter is not deterministic and can't participate 
> in tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to