ChristinaTech commented on issue #6422:
URL: https://github.com/apache/iceberg/issues/6422#issuecomment-1524591137

   So just because incremental read can't use compacted data, doesn't 
necessarily mean you can't use incremental reads, just that it would 
potentially be less efficient. How inefficient incremental reads would be would 
depend largely on how efficiently the data was stored when it was first 
appended to the Iceberg table. 
   
   While querying on a partitioned time column would have the advantage of 
using compacted files where available, it would come with the disadvantage of 
having to deal with missing or duplicate results due to any data being ingested 
late. This could of course be worked around by making sure you waited until you 
were certain no more data would arrive for a given time range before querying 
on it but that would increase the delay between data arriving and actually 
being processed.
   
   Since my last comment I have spent a bit of time thinking on how incremental 
reads could be improved to use compacted data and I am going to open a separate 
issue to track some potential improvements to this process sometime in the 
coming days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to