Hi Chris, Thanks for the info! very helpful! Seems very reasonable, by the way, it all started when I was looking for some open source monitoring tool for Samza/Kafka to see which tasks are the bottleneck in terms of performance. Do you have any experience with such a tool (other than the internal solution developed at LinkedIn)? On 26 Feb 2015 20:11, "Chris Riccomini" <criccom...@apache.org> wrote:
> Hey Dotan, > > The high-level (ZK-based) Kafka consumer (not Samza's) currently uses ZK to > store offsets. They (Kafka) are moving away from this when they re-write > their new NIO-based consumer. They will adopt the strategy of storing > offsets in a Kafka topic, just like Samza has for years. > > The main motivation for not storing offsets in ZK is that it imposes > artificial limits on how often you can checkpoint due to ZK scalability. > For example, if you wanted to checkpoint your offsets after every message, > you would hammer away on ZK with thousands of writers per-second, just for > one consumer. Multiple this out by 100s or 1000s of consumers, and the ZK > grid would never be able to keep up. Kafka is actually really good at > exactly this kind of workload. In general, using ZK as a KV store is not a > great idea. > > The other benefit of storing offsets in Kafka is that it means Samza > doesn't directly depend on ZK (just transitively, through Kafka). This > should make operating Samza easier. > > Cheers, > Chris > > On Wed, Feb 25, 2015 at 10:09 PM, Dotan Patrich <dot...@fortscale.com> > wrote: > > > Hi, > > > > I was looking for a quick and easy way to monitor tasks offsets and > > stumbled upon this utility: > > https://github.com/quantifind/KafkaOffsetMonitor > > > > It didn't work for me and what I discovered is that it they apparently > look > > for the consumers list and offsets in zookeeper, while Samza stores those > > in a kafka topic. > > I tried to think what could be the down sides of using zookeeper to store > > offsets (performance?) but didn't had anything solid that came to mind. > > > > I guess you guys had some discussions regarding this in the past, What > > would be the pros/cons for storing the offsets in a kafka topic instead > of > > zookeeper? > > > > > > Thanks, > > Dotan > > >