[ 
https://issues.apache.org/jira/browse/IMPALA-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Smith resolved IMPALA-13181.
------------------------------------
    Fix Version/s: Impala 4.5.0
       Resolution: Fixed

> Disable tuple caching for locations that have a limit
> -----------------------------------------------------
>
>                 Key: IMPALA-13181
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13181
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>             Fix For: Impala 4.5.0
>
>
> Statements that use a limit are non-deterministic unless there is a sort. 
> Locations with limits should be marked ineligible for tuple caching.
> As an example, for a hash join, suppose the build side has a limit. This 
> means that the build side could vary from run to run. A requirement for our 
> correctness is that all nodes agree on the contents of the build side. The 
> variability of the limit is a problem for the build side, because if one node 
> hits the cache and another does not, there is no guarantee that they agree on 
> the contents of the build side.
> Concrete example: 
> {noformat}
> select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit 
> 10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
> There are times when limits are deterministic or the non-determinism is 
> harmless. It is safer to ban in completely at first. In a future change, this 
> rule can be relaxed to allow caching in those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to