Bruce Robbins created SPARK-47527: ------------------------------------- Summary: Cache misses for queries using With expressions Key: SPARK-47527 URL: https://issues.apache.org/jira/browse/SPARK-47527 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Bruce Robbins
For example: {noformat} create or replace temp view v1 as select id from range(10); create or replace temp view q1 as select * from v1 where id between 2 and 4; cache table q1; explain select * from q1; == Physical Plan == *(1) Filter ((id#51L >= 2) AND (id#51L <= 4)) +- *(1) Range (0, 10, step=1, splits=8) {noformat} Similarly: {noformat} create or replace temp view q2 as select count_if(id > 3) as cnt from v1; cache table q2; explain select * from q2; == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- HashAggregate(keys=[], functions=[count(if (NOT _common_expr_0#88) null else _common_expr_0#88)]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=182] +- HashAggregate(keys=[], functions=[partial_count(if (NOT _common_expr_0#88) null else _common_expr_0#88)]) +- Project [(id#86L > 3) AS _common_expr_0#88] +- Range (0, 10, step=1, splits=8) {noformat} In the output of the above explain commands, neither list an {{InMemoryRelation}} node. The culprit seems to be the common expression ids in the {{With}} expressions used in runtime replacements for {{between}} and {{count_if}}, e.g. [this code|https://github.com/apache/spark/blob/39500a315166d8e342b678ef3038995a03ce84d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala#L43]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org