huaxingao opened a new issue, #5273:
URL: https://github.com/apache/iceberg/issues/5273

   Spark DS V2 Filter was implemented in Spark 3.3 and also a new filter push 
down interface using DS v2 filter was added 
[here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownV2Filters.java):
   
    One of the major motivations to implement Spark DS V2 filter is to 
eliminate the unnecessary conversions between Scala types and Catalyst types.  
The values in Spark DS V1 Filter are Scala types. When translating catalyst 
Expression to V1 filters, Spark has to convert Catalyst types used internally 
in rows to standard Scala types, and later convert Scala types back to Catalyst 
types. For example, if we have a filter `ts = '1965-01-01 10:11:12.123456'`, 
when converting the Catalyst Expression to Spark data source Filter, we need to 
call method 
[`convertToScala`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L530)
 at 
[`translateLeafNodeFilter`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L507)
 to convert long to Timestamp, and later on, call 
[`convertLiteral`](https://github.com/apache/iceberg/blob/master/sp
 ark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java#L222) 
to convert back Timestamp to long in 
[`SparkFilters.convert`](https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java#L111).
 DS V2 Filter uses Expression as filter values (e.g. it uses LiteralValue with 
long for Timestamp), so the conversion from long to Timestamp and Timestamp 
back to long are avoided. By migrating to DS V2 filters, we can improve 
performance by avoiding these unnecessary data type conversions.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to