[ https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-45387: ----------------------------------- Labels: pull-request-available (was: ) > Optimize hive patition filter when the comparision dataType not match > --------------------------------------------------------------------- > > Key: SPARK-45387 > URL: https://issues.apache.org/jira/browse/SPARK-45387 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0 > Reporter: TianyiMa > Priority: Critical > Labels: pull-request-available > Attachments: PruneFileSourcePartitions.diff > > > Suppose we have a partitioned table `table_pt` with partition colum `dt` > which is StringType and the table metadata is managed by Hive Metastore, if > we filter partition by dt = '123', this filter can be pushed down to data > source directly, but if the filter condition is number, e.g. dt = 123, Spark > will not known which partition should be pushed down. Thus in the process of > physical plan optimization, Spark will pull all of that table's partition > meta data to client side, to decide which partition filter should be push > down to the data source. This is poor of performance if the table has > thousands of partitions and increasing the risk of hive metastore oom. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org