akomisarek opened a new issue, #14106: URL: https://github.com/apache/iceberg/issues/14106
### Query engine Not relevant, but Spark in our instance. ### Question Hello, In our previous setup we were using S3 sinker to Sink data from Kafka. It allowed us to partition by landing time, and we had guarantee than we could always easily query daily/hourly data. Now with Iceberg and Kafka Connect Iceberg, I don't see how we can achieve easily something similar, i.e. we implemented processing time SMT and add it to record, but it doesn't truly to work conceptually - this would be time Kafka Connect framework receives record, but physically it can appear in Iceberg table later (up to commit) or even much later (due to some Connector issues). Now from the scope of processing daily batches of data, is there any easy way to only process records arrived in a single day? Or do we need to physically use CDC to rely on commit timestamps to get information we need? Any ideas? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org