Joe McDonnell created IMPALA-13181:
--------------------------------------
Summary: Disable tuple caching for locations that have a limit
Key: IMPALA-13181
URL: https://issues.apache.org/jira/browse/IMPALA-13181
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Joe McDonnell
Statements that use a limit are non-deterministic unless there is a sort.
Locations with limits should be marked ineligible for tuple caching.
As an example, for a hash join, suppose the build side has a limit. This means
that the build side could vary from run to run. A requirement for our
correctness is that all nodes agree on the contents of the build side. The
variability of the limit is a problem for the build side, because if one node
hits the cache and another does not, there is no guarantee that they agree on
the contents of the build side.
Concrete example:
{noformat}
select a.l_orderkey from (select l_orderkey from tpch_parquet.lineitem limit
10) a, tpch_parquet.orders b where a.l_orderkey = b.o_orderkey;{noformat}
There are times when limits are deterministic or the non-determinism is
harmless. It is safer to ban in completely at first. In a future change, this
rule can be relaxed to allow caching in those cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)