Yida Wu created IMPALA-14146: -------------------------------- Summary: Incorrect rand() evaluation behavior in where condition Key: IMPALA-14146 URL: https://issues.apache.org/jira/browse/IMPALA-14146 Project: IMPALA Issue Type: Bug Affects Versions: Impala 4.5.0 Reporter: Yida Wu
We've observed cases where rand() is re-evaluated multiple times within predicates, which may result in incorrect query results. For example: {code:java} Create table test1 (a int); Insert into test1 values (1), (1), (1), (1); {code} Query: {code:java} select * from (select rand(1) as rd from test1) t1 where rd > 0.4 and rd < 0.4; {code} Expected: No rows should match, as no value can be both greater than and less than 0.4. Actual Results: {code:java} +---------------------+ | rd | +---------------------+ | 0.41702283693685577 | +---------------------+ Fetched 1 row(s) in 0.12s {code} >From the log I have added and the plan, I can see rand() evaluated twice for >the rd, however even with two different values generated, it remains unclear >how the overall condition evaluates to TRUE, as both values are larger than >0.4. {code:java} 4081:I20250613 17:58:57.296629 23889 math-functions-ir.cc:207] e449dc09337fadec:1fb3575500000001] Rand() generated value: 0.417023 4082:I20250613 17:58:57.296660 23889 math-functions-ir.cc:207] e449dc09337fadec:1fb3575500000001] Rand() generated value: 0.458344 {code} {code:java} F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1 PLAN-ROOT SINK | output exprs: rand(CAST(1 AS BIGINT)) | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | 01:EXCHANGE [UNPARTITIONED] | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 | tuple-ids=0 row-size=0B cardinality=1 | in pipelines: 00(GETNEXT) | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 Per-Host Resources: mem-estimate=32.02MB mem-reservation=8.00KB thread-reservation=2 00:SCAN HDFS [default.test1, RANDOM] HDFS partitions=1/1 files=1 size=8B predicates: rand(CAST(1 AS BIGINT)) < CAST(0.4 AS DOUBLE), rand(CAST(1 AS BIGINT)) > CAST(0.4 AS DOUBLE) stored statistics: table: rows=unavailable size=unavailable columns: all extrapolated-rows=disabled max-scan-range-rows=unavailable {code} In summary, rand() appears to be evaluated multiple times within the predicates. The value should be computed only once as part of the output expression, rather than being re-evaluated as a function in each predicate, but there may be additional issues contributing to the incorrect results as the example shows, could be related to IMPALA-14145. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org