Tom van Bussel created SPARK-46794: -------------------------------------- Summary: Incorrect results due to inferred predicate from checkpoint with subquery Key: SPARK-46794 URL: https://issues.apache.org/jira/browse/SPARK-46794 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Tom van Bussel
Spark can produce incorrect results when using a checkpointed DataFrame with a filter containing a scalar subquery. This subquery is included in the constraints of the resulting LogicalRDD, and may then be propagated as a filter when joining with the checkpointed DataFrame. This causes the subquery to be evaluated twice: once during checkpointing and once while evaluating the query. These two subquery evaluations may return different results, e.g. when the subquery contains a limit with an underspecified sort order. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org