dkoepke commented on PR #15035: URL: https://github.com/apache/druid/pull/15035#issuecomment-1747278958
> Something like having the sampler check the latest offset available in Kafka first (using RecordSupplier#getLatestSequenceNumber) and then returning early once it has read up through that latest message. Would this cause issues when `useEarliestOffset` is false? Or if the earliest offset happens to be close to the latest in a stream that's actively receiving data? From my understanding, the timeout here is basically defining how long a stream can be idle (have no data arrive / offsets stay the same). The goal is to improve the experience for web console (and similar UIs) when a stream is low volume at the time ingestion is being setup in Druid. Right now, for the web console, if there are fewer than 500 (`numRows`) rows, each call to the sampler always takes 15 seconds (`timeoutMs`). The web console calls the sampler a lot during the normal flow, so this can be pretty slow. > I can't imagine that most users will set any of these config parameters in a case like that. Agreed. The intent of the config in the sampler payload is to allow UIs to tweak their settings without users having to redeploy Druid. It's not intended to be set (or even seen) by normal end users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
