Reo-LEI opened a new issue #3118: URL: https://github.com/apache/iceberg/issues/3118
In upsert/cdc case, we usually will get a lot of pos-delete and eq-delete files. When we read/rewrite data from the v2 table, `DeleteFilter` will open all referenced pos-delete files and eq-delete files for each data file to construct the posDeleteSet and eqDeleteSet. Currently, that all work will handled by same thread for each `CombinedScanTask` and all delete files are read serially, that is mean iceberg read a delete file must wait for the last file to be read and `DeleteFilter` will take a lot of time to open and read delete files. I think `DeleteFilter` should read delete files in parallel when construct the posDeleteSet and eqDeleteSet to speed up reading v2 table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
