Jonny Serencsa created SPARK-41636:
--------------------------------------
Summary: DataSourceStrategy#selectFilters returns predicates in
non-deterministic order
Key: SPARK-41636
URL: https://issues.apache.org/jira/browse/SPARK-41636
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.1.0
Reporter: Jonny Serencsa
Method
org.apache.spark.sql.execution.datasources.DataSourceStrategy#selectFilters,
which is used to determine "pushdown-able" filters, does not preserve the order
of the input {{Seq[Expression]}} nor does it return the same order across the
same plans (modulo ExprId differences). This is resulting in CodeGenerator
cache misses even when the exact same LogicalPlan is executed.
The aforementioned method does not attempt to maintain the order of the input
predicates, though it happens to do so when there are less than 5 pushdown-able
{{Expression}} in the input (due to some "small maps" logic in
{{{}scala.collection.TraversableOnce#toMap{}}}).
Returning in the same order as the input will reduce churn on the CodeGenerator
cache under prolonged workloads that execute queries that are very similar.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]