Tom van Bussel created SPARK-46794:
--------------------------------------

             Summary: Incorrect results due to inferred predicate from 
checkpoint with subquery 
                 Key: SPARK-46794
                 URL: https://issues.apache.org/jira/browse/SPARK-46794
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Tom van Bussel


Spark can produce incorrect results when using a checkpointed DataFrame with a 
filter containing a scalar subquery. This subquery is included in the 
constraints of the resulting LogicalRDD, and may then be propagated as a filter 
when joining with the checkpointed DataFrame. This causes the subquery to be 
evaluated twice: once during checkpointing and once while evaluating the query. 
These two subquery evaluations may return different results, e.g. when the 
subquery contains a limit with an underspecified sort order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to