[
https://issues.apache.org/jira/browse/IMPALA-13181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-13181.
------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Disable tuple caching for locations that have a limit
> -----------------------------------------------------
>
> Key: IMPALA-13181
> URL: https://issues.apache.org/jira/browse/IMPALA-13181
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> Statements that use a limit are non-deterministic unless there is a sort.
> Locations with limits should be marked ineligible for tuple caching.
> As an example, for a hash join, suppose the build side has a limit. This
> means that the build side could vary from run to run. A requirement for our
> correctness is that all nodes agree on the contents of the build side. The
> variability of the limit is a problem for the build side, because if one node
> hits the cache and another does not, there is no guarantee that they agree on
> the contents of the build side.
> Concrete example:
> {noformat}
> select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit
> 10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
> There are times when limits are deterministic or the non-determinism is
> harmless. It is safer to ban in completely at first. In a future change, this
> rule can be relaxed to allow caching in those cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)