Found the problem. Now we are sub 400ms. While I was setting the per-topic flush interval, I failed to set the scheduler interval via log.default.flush.scheduler.interval.ms
I was going to open a JIRA to have this auto-set using the minimum of any custom flush intervals, but it seems that its moot in 0.8. Thanks for the help! - Bob On Thu, Nov 8, 2012 at 8:57 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > Oops, missed what you said--that you had already dropped the flush > interval. Listen to Neha :-) > > -Jay > > > On Thu, Nov 8, 2012 at 7:57 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> Hi Bob, >> >> Currently the broker does not hand out messages to consumers until they >> are flushed to disk, this means the flush interval acts as a lower bound on >> worst case latency. Setting that lower should fix the problem. >> >> This problem has been eliminated in the next release, as both the >> blocking on flush and the fetcher backoff have been eliminated--this should >> drop latency to a few ms. >> >> -Jay >> >> >> On Wed, Nov 7, 2012 at 5:55 PM, Bob Cotton <bcot...@rallydev.com> wrote: >> >>> Hello, >>> >>> We have a low-volume topic (~75msgs/sec) for which we would like to have >>> a >>> low propagation delay from producer to consumer. >>> >>> We have 3 brokers, each with a default of 4 partitions each. for a total >>> of >>> 12 partitions. >>> The producer is sync, without compression. There are 8 producers each >>> producing 1/8 of the traffic. >>> We are using the high-level java consumer, with 4 threads consuming the >>> topic. >>> >>> We are wrapping the message with a custom Encoder/Decoder and record >>> currentTimeMillis() on the sender, and do the same in the receiver, then >>> record the propagation delay. All hosts are time synced with ntp. >>> >>> With the settings on the broker for flush messages and flush interval >>> (unset, defaults to 500 msgs and 3000ms) the overall 95th percentile for >>> propagation is 2,500ms. >>> >>> When we adjust the topic flush interval to 20ms, the 95th percentile >>> drops >>> to 1,700ms >>> When we adjust the consumers "fetcher.backoff.ms" to 10, the 95th >>> percentile drops to about 970ms. >>> >>> We would like this to be sub-500ms. >>> We could run with less partitions and/or more consumer threads. >>> >>> Anything glaring about this config? anything we're missing? >>> >>> Thanks >>> -Bob >>> >> >> >