[
https://issues.apache.org/jira/browse/FLINK-29091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lincoln lee updated FLINK-29091:
--------------------------------
Description:
RAND and RAND_INTEGER are declared as dynamic function (isDynamicFuntion
returns true), as the declaration it should only evaluate once at query-level
(not per record) for batch mode, FLINK-21713 did the similar fix for temporal
functions.
But current behavior is completely a non-deterministic function which evaluated
per record for both batch and streaming mode, it's not a good choice to break
current behavior, and the determinism of RAND function are also different
across vendors:
[1] evaluated at query-level though it is treated as non-deterministic function
[https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]
[2][ evaluated at row level:
[https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]
[3] evaluated at row level if not specifies a seed, e.g., DBMS_RANDOM.normal,
DBMS_RANDOM.value(1,10)
[https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]
So just keep the current behavior and update these two functions' definition
to non-deterministic can avoid the affection to users, and make it clearly.
was:
RAND and RAND_INTEGER are declared as dynamic function (isDynamicFuntion
returns true), as the declaration it should only evaluate once at query-level
(not per record) for batch mode, FLINK-21713 did the similar fix for temporal
functions.
But current behavior is completely a non-deterministic function which evaluated
per record for both batch and streaming mode, it's not a good choice to break
current behavior, and the determinism of RAND function are also different
across vendors:
[1] evaluated at query-level though it is treated as non-deterministic function
[https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]
[2][ evaluated at row level:
[https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]
[3] evaluated at row level if not specifies a seed, e.g., DBMS_RANDOM.normal,
DBMS_RANDOM.value(1,10)
[https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]
So keep the current behavior and update these two functions' definition to
non-deterministic can avoid the affection to users, and make it clearly.
> Fix the determinism declaration of the rand function to be consistent with
> the current behavior
> -----------------------------------------------------------------------------------------------
>
> Key: FLINK-29091
> URL: https://issues.apache.org/jira/browse/FLINK-29091
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Reporter: lincoln lee
> Priority: Major
>
> RAND and RAND_INTEGER are declared as dynamic function (isDynamicFuntion
> returns true), as the declaration it should only evaluate once at query-level
> (not per record) for batch mode, FLINK-21713 did the similar fix for temporal
> functions.
> But current behavior is completely a non-deterministic function which
> evaluated per record for both batch and streaming mode, it's not a good
> choice to break current behavior, and the determinism of RAND function are
> also different across vendors:
> [1] evaluated at query-level though it is treated as non-deterministic
> function
> [https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]
> [2][ evaluated at row level:
> [https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]
> [3] evaluated at row level if not specifies a seed, e.g.,
> DBMS_RANDOM.normal, DBMS_RANDOM.value(1,10)
> [https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]
> So just keep the current behavior and update these two functions' definition
> to non-deterministic can avoid the affection to users, and make it clearly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)