Hi,

I read the reasoning about using offsets.retention.minutes at
https://mail-archives.apache.org/mod_mbox/kafka-users/201602.mbox/%3ccaaofhrah8p_a1yebfnh4wzsjwgiqpob_pr6hn4nymtluqqb...@mail.gmail.com%3E,
but can we agree that the original reason behind it is wrong?  In my
personal world view, offsets.retention.minutes sounds like an early
optimization, thus not adhering to the rule of not optimizing too early.

I understand the reasoning from that email thread that short lived
consumers could cause a lot of offsets to be stored in kafka, but what's
the big deal?  I'd much rather have a system that, by default, keeps
offsets around until data in the topic falls out of retention.  If we need
to optimize, we should have a setting like offsets.retention.minutes that
can be set to something other than 0 or a negative number to indicate
earlier cleanup cycles, because I think by default, data safety is more
important than some small gain in disk space from stored offsets.

For example, what's the use of kafka's offset guarantees if they aren't
guaranteed?  You build an app around the assumption that kafka will hold
onto your offsets, but what if the topic only occasionally has data going
through it?  So, imagine that the data is still flowing faster than the
overall log retention.  Should your application just start from the
"auto.offset.reset" offset again because offsets were prematurely optimized
out, causing your app to break down and process items multiple times in the
case of "earliest" auto.offset.reset or miss several items in the case of
"latest" auto.offset.reset?  That seems like an odd default behavior.

At this moment, we have this kind of problem going on.  We are currently
trying to update offsets.retention.minutes, but can we please change the
default way this is managed?  We use auto.offset.reset="earliest".

Thanks.

-- 




*William GrimSr. Software Engineerm: 914 418 4115
<914%20418%204115>e: wg...@signal.co <wg...@signal.co>signal.co
<http://www.google.com/url?q=http%3A%2F%2Fsignal.co%2F&sa=D&sntz=1&usg=AFrqEzf9BbOBbhu7G5O2liTp3wXdU2t6FA>________________________Cut
Through the NoiseThis e-mail and any files transmitted with it are for the
sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized use of this email is strictly
prohibited. ©2015 Signal. All rights reserved.*

Reply via email to