ChristinaTech commented on issue #6422:
URL: https://github.com/apache/iceberg/issues/6422#issuecomment-1518090607

   At present, Incremental Read will use the old snapshots and files. The 
primary limiting factor lies in the fact that `replace` snapshots, which add 
and remove data files without changing the actual data and are what rewrite 
procedures use, do not keep close track of what files were used to create what 
other files and how.
   
   This means that, even if support were added for interpreting `replace` 
snapshots as is, their replacement files could only be used if every file 
removed by the replace was included in the interval of the incremental read.
   
   This could be moderately improved if `replace` snapshots stored a map of 
what specific files were used in the creation of what other files, but even 
then it still wouldn't be helpful in a lot of cases, as rewrite by default will 
generally end up merging files from inside the incremental read interval with 
files from outside the incremental read interval.
   
   I will note that it would be beneficial if Iceberg could support this 
behavior, as it would help mitigate the performance impact of micro-batch file 
ingestion on incremental reads that take place after compaction. Need to spend 
some time brainstorming technical solutions to the problems preventing this 
from happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to