[
https://issues.apache.org/jira/browse/IMPALA-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898688#comment-17898688
]
ASF subversion and git services commented on IMPALA-13179:
----------------------------------------------------------
Commit 68c42a5d660be4c411a61668af474ad43d730b69 in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=68c42a5d6 ]
IMPALA-13179: Make non-deterministic functions ineligible for tuple caching
Non-deterministic functions should make a location ineligible
for caching. Unlike existing definitions of non-determinism
like FunctionCallExpr.isNondeterministicBuiltinFn(),
the non-determinism needs to apply over time and across query
boundaries, so it is a broader list of functions.
The following are considered non-deterministic in this change:
1. Random functions like rand/random/uuid
2. Current time functions like now/current_timestamp
3. Session/system information like current_user/pid/coordinator
4. AI functions
5. UDFs
With enable_expr_rewrites=true, constant folding can replace
some of these with a single constant (e.g. now() becomes a specific
timestamp). This is not a correctness problem for tuple caching,
because the specific value is incorporated into the cache key.
Testing:
- Added test cases to TupleCacheTest
Change-Id: I9601dba87b3c8f24cbe42eca0d8070db42b50488
Reviewed-on: http://gerrit.cloudera.org:8080/22011
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Disable tuple caching when using non-deterministic functions
> ------------------------------------------------------------
>
> Key: IMPALA-13179
> URL: https://issues.apache.org/jira/browse/IMPALA-13179
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
>
> Some functions are non-deterministic, so tuple caching needs to detect those
> functions and avoid caching at locations that are non-deterministic.
> There are two different pieces:
> # Correctness: If the key is constant but the results can be variable, then
> that is a correctness issue. That can happen for genuinely random functions
> like uuid(). It can happen when timestamp functions like now() are evaluated
> at runtime.
> # Performance: The frontend does constant-folding of functions that don't
> vary during executions, so something like now() might be replaced by a
> hard-coded integer. This means that the key contains something that varies
> frequently. That can be a performance issue, because we can be caching things
> that cannot be reused. This doesn't have the same correctness issue.
> This ticket is focused on correctness piece. If uuid()/now()/etc are
> referenced and would be evaluated at runtime, the location should be
> ineligible for tuple caching.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]