ChristinaTech commented on issue #6422: URL: https://github.com/apache/iceberg/issues/6422#issuecomment-1524591137
So just because incremental read can't use compacted data, doesn't necessarily mean you can't use incremental reads, just that it would potentially be less efficient. How inefficient incremental reads would be would depend largely on how efficiently the data was stored when it was first appended to the Iceberg table. While querying on a partitioned time column would have the advantage of using compacted files where available, it would come with the disadvantage of having to deal with missing or duplicate results due to any data being ingested late. This could of course be worked around by making sure you waited until you were certain no more data would arrive for a given time range before querying on it but that would increase the delay between data arriving and actually being processed. Since my last comment I have spent a bit of time thinking on how incremental reads could be improved to use compacted data and I am going to open a separate issue to track some potential improvements to this process sometime in the coming days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
