akomisarek opened a new issue, #14106:
URL: https://github.com/apache/iceberg/issues/14106

   ### Query engine
   
   Not relevant, but Spark in our instance. 
   
   ### Question
   
   Hello,
   
   In our previous setup we were using S3 sinker to Sink data from Kafka. It 
allowed us to partition by landing time, and we had guarantee than we could 
always easily query daily/hourly data. 
   
   Now with Iceberg and Kafka Connect Iceberg, I don't see how we can achieve 
easily something similar, i.e. we implemented processing time SMT and add it to 
record, but it doesn't truly to work conceptually - this would be time Kafka 
Connect framework receives record, but physically it can appear in Iceberg 
table later (up to commit) or even much later (due to some Connector issues). 
   
   Now from the scope of processing daily batches of data, is there any easy 
way to only process records arrived in a single day? Or do we need to 
physically use CDC to rely on commit timestamps to get information we need? 
   
   Any ideas?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to