bryanck opened a new pull request, #5143: URL: https://github.com/apache/iceberg/pull/5143
A performance regression was introduced with the refactoring of the Spark batch classes. The cause is that `BatchScanExec` does an equality check on the batch object. However with the refactoring, a new batch object was created for each call to `toBatch()` so the equality check returned false for the same scan instance (because the batch objects were different). The end result is that filters weren't being pushed down in some cases. This PR ensures only one batch object is created so that for a given scan object, the batch object will be the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
