josephglanville commented on issue #7030:
URL: https://github.com/apache/druid/issues/7030#issuecomment-822199322


   I'm interested in taking this up.
   
   My research leads to believe the best way to implement this is building on 
the SeekableStream abstraction. Now on the surface this may appear like an 
impedance mismatch as Pulsar is primarily built around managing 
offsets/consumer state on the broker side but I still think the SeekableStream 
approach is best for Druid because it best suits it's notions of tasks and 
segments.
   
   The way I think this should be implemented is to have the supervisor create 
a task per partition of the Pulsar topic, each task will then use an exclusive, 
non-durable subscription that consumes from that specific partition. In this 
way seeking to a specific message ID can be supported cleanly, which is 
required to support task resumption and idempotency.
   
   This will result in one segment per task however so users of this indexing 
service will likely want to enable compaction.
   
   @sijie does this approach sound correct to you?
   
   /cc @gianm @jsun98 @dclim as you gentlemen worked on the SeekableStream 
abstraction and would appreciate your thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to