Reo-LEI opened a new issue #3118:
URL: https://github.com/apache/iceberg/issues/3118


   In upsert/cdc case, we usually will get a lot of pos-delete and eq-delete 
files. When we read/rewrite data from the v2 table, `DeleteFilter` will open 
all referenced pos-delete files and eq-delete files for each data file to 
construct the posDeleteSet and eqDeleteSet. 
   
   Currently, that all work will handled by same thread for each 
`CombinedScanTask` and all delete files are read serially, that is mean iceberg 
read a delete file must wait for the last file to be read and `DeleteFilter` 
will take a lot of time to open and read delete files. 
   I think `DeleteFilter` should read delete files in parallel when construct 
the posDeleteSet and eqDeleteSet to speed up reading v2 table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to