wypoon opened a new pull request, #4588: URL: https://github.com/apache/iceberg/pull/4588
This is an extension of #4395. Here we add a custom metric for the number of delete rows that have been applied in a scan of a format v2 table. We introduce a counter in `BatchDataReader` and `RowDataReader`, that is incremented when a delete is applied. This counter is passed into `DeleteFilter`, and in the cases where we construct a `PositionDeleteIndex`, is passed into the implementation of the `PositionDeleteIndex`. In all the read paths, the counter is incremented whenever a delete is applied. When Spark calls `currentMetricsValues()` on a `PartitionReader`, which is a subclass of either `BatchDataReader` or `RowDataReader`, we get the current value of the counter and return that. Tested manually by creating a format v2 table using each of Parquet, ORC, and Avro files, deleting and updating rows in the tables, and reading from the table. The expected number of deletes show up in the Spark UI. Also extended existing unit tests (`DeleteReadTests`) to count the number of deletes applied during the scan. <img width="529" alt="Screen Shot 2022-04-19 at 8 59 31 PM" src="https://user-images.githubusercontent.com/3925490/164147620-7085eafd-304d-45d1-aa53-0c6029638d48.png"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
