[
https://issues.apache.org/jira/browse/SPARK-46460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot reassigned SPARK-46460:
--------------------------------------
Assignee: (was: Apache Spark)
> The filter of partition including cast function may lead the partition
> pruning to disable
> -----------------------------------------------------------------------------------------
>
> Key: SPARK-46460
> URL: https://issues.apache.org/jira/browse/SPARK-46460
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer, SQL
> Affects Versions: 3.2.0
> Reporter: Zhou Tong
> Priority: Minor
> Labels: pull-request-available
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> SQL:select * from test_db.test_table where day between
> date_sub('2023-12-01',1) and '2023-12-03'
> The Physical Plan of sql above will implement _cast_ function on partition
> col 'day', like this, {_}cast(day as date) > 2023-11-30{_}. In this
> situation, spark just pass the filter condition _day < "2023-12-03"_ to
> HiveMetastore, not including filter condition {_}cast(day as date) >
> 2023-11-30{_}, which may lead performance of HMS degarde if the HiveTable has
> huge number of partitions.
>
> In this regard, a new rule may solve this problem. This rule can convert
> binary comparison _cast(day as date) > 2023-11-30_ to {_}day >
> cast(2023-11-30 as string){_}. The right node is foldable, so the result is
> {_}day > "2023-11-30"{_}, and the filter condition passed to HMS will be _day
> > "2023-11-30" and_ _day < "2023-12-03"._
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]