Andy Grove created ARROW-10686:
----------------------------------
Summary: [Rust] [DataFusion] Combine conjunctive filters
Key: ARROW-10686
URL: https://issues.apache.org/jira/browse/ARROW-10686
Project: Apache Arrow
Issue Type: Improvement
Components: Rust - DataFusion
Reporter: Andy Grove
When using the DataFrame API, it is natural to chain together filter operations
like this:
{code:java}
.filter(col("l_commitdate").lt(col("l_receiptdate")))?
.filter(col("l_shipdate").lt(col("l_commitdate")))?
.filter(col("l_receiptdate").gt_eq(lit("1994-01-01")))?
.filter(col("l_receiptdate").lt(lit("1995-01-01")))?{code}
This results in the following plan:
{code:java}
Filter: #l_receiptdate Lt Utf8("1995-01-01")
Filter: #l_receiptdate GtEq Utf8("1994-01-01")
Filter: #l_shipdate Lt #l_commitdate
Filter: #l_commitdate Lt #l_receiptdate{code}
We could implement an optimizer rule that combines these into a single filter:
{code:java}
Filter: #l_receiptdate Lt Utf8("1995-01-01") AND #l_receiptdate GtEq
Utf8("1994-01-01") AND #l_shipdate Lt #l_commitdate AND #l_commitdate Lt
#l_receiptdate {code}
This will lead to a more concise plan and possibly will reduce some overhead.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)