dkoepke commented on PR #15035:
URL: https://github.com/apache/druid/pull/15035#issuecomment-1747278958

   > Something like having the sampler check the latest offset available in 
Kafka first (using RecordSupplier#getLatestSequenceNumber) and then returning 
early once it has read up through that latest message.
   
   Would this cause issues when `useEarliestOffset` is false? Or if the 
earliest offset happens to be close to the latest in a stream that's actively 
receiving data?
   
   From my understanding, the timeout here is basically defining how long a 
stream can be idle (have no data arrive / offsets stay the same). The goal is 
to improve the experience for web console (and similar UIs) when a stream is 
low volume at the time ingestion is being setup in Druid. Right now, for the 
web console, if there are fewer than 500 (`numRows`) rows, each call to the 
sampler always takes 15 seconds (`timeoutMs`). The web console calls the 
sampler a lot during the normal flow, so this can be pretty slow.
   
   >  I can't imagine that most users will set any of these config parameters 
in a case like that.
   
   Agreed. The intent of the config in the sampler payload is to allow UIs to 
tweak their settings without users having to redeploy Druid. It's not intended 
to be set (or even seen) by normal end users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to