Jonny Serencsa created SPARK-41636:
--------------------------------------

             Summary: DataSourceStrategy#selectFilters returns predicates in 
non-deterministic order
                 Key: SPARK-41636
                 URL: https://issues.apache.org/jira/browse/SPARK-41636
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Jonny Serencsa


Method 
org.apache.spark.sql.execution.datasources.DataSourceStrategy#selectFilters, 
which is used to determine "pushdown-able" filters, does not preserve the order 
of the input {{Seq[Expression]}} nor does it return the same order across the 
same plans (modulo ExprId differences). This is resulting in CodeGenerator 
cache misses even when the exact same LogicalPlan is executed. 

The aforementioned method does not attempt to maintain the order of the input 
predicates, though it happens to do so when there are less than 5 pushdown-able 
{{Expression}} in the input (due to some "small maps" logic in 
{{{}scala.collection.TraversableOnce#toMap{}}}). 

Returning in the same order as the input will reduce churn on the CodeGenerator 
cache under prolonged workloads that execute queries that are very similar. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to