huaxingao opened a new issue, #5273: URL: https://github.com/apache/iceberg/issues/5273
Spark DS V2 Filter was implemented in Spark 3.3 and also a new filter push down interface using DS v2 filter was added [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownV2Filters.java): One of the major motivations to implement Spark DS V2 filter is to eliminate the unnecessary conversions between Scala types and Catalyst types. The values in Spark DS V1 Filter are Scala types. When translating catalyst Expression to V1 filters, Spark has to convert Catalyst types used internally in rows to standard Scala types, and later convert Scala types back to Catalyst types. For example, if we have a filter `ts = '1965-01-01 10:11:12.123456'`, when converting the Catalyst Expression to Spark data source Filter, we need to call method [`convertToScala`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L530) at [`translateLeafNodeFilter`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L507) to convert long to Timestamp, and later on, call [`convertLiteral`](https://github.com/apache/iceberg/blob/master/sp ark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java#L222) to convert back Timestamp to long in [`SparkFilters.convert`](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java#L111). DS V2 Filter uses Expression as filter values (e.g. it uses LiteralValue with long for Timestamp), so the conversion from long to Timestamp and Timestamp back to long are avoided. By migrating to DS V2 filters, we can improve performance by avoiding these unnecessary data type conversions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
