josephglanville commented on issue #7030: URL: https://github.com/apache/druid/issues/7030#issuecomment-822199322
I'm interested in taking this up. My research leads to believe the best way to implement this is building on the SeekableStream abstraction. Now on the surface this may appear like an impedance mismatch as Pulsar is primarily built around managing offsets/consumer state on the broker side but I still think the SeekableStream approach is best for Druid because it best suits it's notions of tasks and segments. The way I think this should be implemented is to have the supervisor create a task per partition of the Pulsar topic, each task will then use an exclusive, non-durable subscription that consumes from that specific partition. In this way seeking to a specific message ID can be supported cleanly, which is required to support task resumption and idempotency. This will result in one segment per task however so users of this indexing service will likely want to enable compaction. @sijie does this approach sound correct to you? /cc @gianm @jsun98 @dclim as you gentlemen worked on the SeekableStream abstraction and would appreciate your thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
