[ 
https://issues.apache.org/jira/browse/IMPALA-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898688#comment-17898688
 ] 

ASF subversion and git services commented on IMPALA-13179:
----------------------------------------------------------

Commit 68c42a5d660be4c411a61668af474ad43d730b69 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=68c42a5d6 ]

IMPALA-13179: Make non-deterministic functions ineligible for tuple caching

Non-deterministic functions should make a location ineligible
for caching. Unlike existing definitions of non-determinism
like FunctionCallExpr.isNondeterministicBuiltinFn(),
the non-determinism needs to apply over time and across query
boundaries, so it is a broader list of functions.

The following are considered non-deterministic in this change:
 1. Random functions like rand/random/uuid
 2. Current time functions like now/current_timestamp
 3. Session/system information like current_user/pid/coordinator
 4. AI functions
 5. UDFs

With enable_expr_rewrites=true, constant folding can replace
some of these with a single constant (e.g. now() becomes a specific
timestamp). This is not a correctness problem for tuple caching,
because the specific value is incorporated into the cache key.

Testing:
 - Added test cases to TupleCacheTest

Change-Id: I9601dba87b3c8f24cbe42eca0d8070db42b50488
Reviewed-on: http://gerrit.cloudera.org:8080/22011
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Disable tuple caching when using non-deterministic functions
> ------------------------------------------------------------
>
>                 Key: IMPALA-13179
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13179
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.5.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>
> Some functions are non-deterministic, so tuple caching needs to detect those 
> functions and avoid caching at locations that are non-deterministic.
> There are two different pieces:
>  # Correctness: If the key is constant but the results can be variable, then 
> that is a correctness issue. That can happen for genuinely random functions 
> like uuid(). It can happen when timestamp functions like now() are evaluated 
> at runtime.
>  # Performance: The frontend does constant-folding of functions that don't 
> vary during executions, so something like now() might be replaced by a 
> hard-coded integer. This means that the key contains something that varies 
> frequently. That can be a performance issue, because we can be caching things 
> that cannot be reused. This doesn't have the same correctness issue.
> This ticket is focused on correctness piece. If uuid()/now()/etc are 
> referenced and would be evaluated at runtime, the location should be 
> ineligible for tuple caching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to