udaysagar2177 commented on PR #17688:
URL: https://github.com/apache/pinot/pull/17688#issuecomment-3893971229

   @noob-se7en Thanks for the feedback. The current Kafka-based approach 
remains useful for users who prefer to avoid the operational complexity of 
streaming commits into Apache Iceberg.
   
   Regarding reading directly from the upstream object store: clarification on 
the intended mechanism would help. How would Apache Pinot discover new files 
and track which ones have already been processed?
   
   One possible direction is using Iceberg tables, which provide snapshots, 
manifests with file metadata, and incremental scans for new files. The existing 
microbatch framework would remain largely unchanged - file discovery from 
Iceberg snapshots instead of Kafka, while MicroBatchQueueManager continues 
handling download and processing. I posted a short PEP describing this 
Iceberg-based approach here: https://github.com/apache/pinot/issues/17694. 
Could you clarify whether your suggestion refers to a different model?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to