udaysagar2177 commented on PR #17688: URL: https://github.com/apache/pinot/pull/17688#issuecomment-3893971229
@noob-se7en Thanks for the feedback. The current Kafka-based approach remains useful for users who prefer to avoid the operational complexity of streaming commits into Apache Iceberg. Regarding reading directly from the upstream object store: clarification on the intended mechanism would help. How would Apache Pinot discover new files and track which ones have already been processed? One possible direction is using Iceberg tables, which provide snapshots, manifests with file metadata, and incremental scans for new files. The existing microbatch framework would remain largely unchanged - file discovery from Iceberg snapshots instead of Kafka, while MicroBatchQueueManager continues handling download and processing. I posted a short PEP describing this Iceberg-based approach here: https://github.com/apache/pinot/issues/17694. Could you clarify whether your suggestion refers to a different model? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
