Bruce Robbins created SPARK-47527:
-------------------------------------

             Summary: Cache misses for queries using With expressions
                 Key: SPARK-47527
                 URL: https://issues.apache.org/jira/browse/SPARK-47527
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Bruce Robbins


For example:
{noformat}
create or replace temp view v1 as
select id from range(10);

create or replace temp view q1 as
select * from v1
where id between 2 and 4;

cache table q1;

explain select * from q1;

== Physical Plan ==
*(1) Filter ((id#51L >= 2) AND (id#51L <= 4))
+- *(1) Range (0, 10, step=1, splits=8)
{noformat}
Similarly:
{noformat}
create or replace temp view q2 as
select count_if(id > 3) as cnt
from v1;

cache table q2;

explain select * from q2;

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[count(if (NOT _common_expr_0#88) null else 
_common_expr_0#88)])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=182]
      +- HashAggregate(keys=[], functions=[partial_count(if (NOT 
_common_expr_0#88) null else _common_expr_0#88)])
         +- Project [(id#86L > 3) AS _common_expr_0#88]
            +- Range (0, 10, step=1, splits=8)

{noformat}
In the output of the above explain commands, neither list an 
{{InMemoryRelation}} node.

The culprit seems to be the common expression ids in the {{With}} expressions 
used in runtime replacements for {{between}} and {{count_if}}, e.g. [this 
code|https://github.com/apache/spark/blob/39500a315166d8e342b678ef3038995a03ce84d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala#L43].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to