Hey Dotan, Samza has the checkpoint-tool.sh script, which can be used to read checkpoints for a given task. The MetricsSnapshotReporter can also be used to read metrics from a Samza job to check its offset progress.
I don't believe that there's anything on the OS side that's plug and play, but you could hook up Samza to a metrics system (Graphite, etc) and do metrics/monitoring that way. A simple hack is to use MetricsSnapshotReporter, and then run kafka-console-consumer.sh to read the JSON blobs, and parse the metrics that way Cheers, Chris On Thu, Feb 26, 2015 at 11:03 AM, Dotan Patrich <dot...@fortscale.com> wrote: > Hi Chris, > > Thanks for the info! very helpful! > Seems very reasonable, by the way, it all started when I was looking for > some open source monitoring tool for Samza/Kafka to see which tasks are the > bottleneck in terms of performance. Do you have any experience with such a > tool (other than the internal solution developed at LinkedIn)? > On 26 Feb 2015 20:11, "Chris Riccomini" <criccom...@apache.org> wrote: > > > Hey Dotan, > > > > The high-level (ZK-based) Kafka consumer (not Samza's) currently uses ZK > to > > store offsets. They (Kafka) are moving away from this when they re-write > > their new NIO-based consumer. They will adopt the strategy of storing > > offsets in a Kafka topic, just like Samza has for years. > > > > The main motivation for not storing offsets in ZK is that it imposes > > artificial limits on how often you can checkpoint due to ZK scalability. > > For example, if you wanted to checkpoint your offsets after every > message, > > you would hammer away on ZK with thousands of writers per-second, just > for > > one consumer. Multiple this out by 100s or 1000s of consumers, and the ZK > > grid would never be able to keep up. Kafka is actually really good at > > exactly this kind of workload. In general, using ZK as a KV store is not > a > > great idea. > > > > The other benefit of storing offsets in Kafka is that it means Samza > > doesn't directly depend on ZK (just transitively, through Kafka). This > > should make operating Samza easier. > > > > Cheers, > > Chris > > > > On Wed, Feb 25, 2015 at 10:09 PM, Dotan Patrich <dot...@fortscale.com> > > wrote: > > > > > Hi, > > > > > > I was looking for a quick and easy way to monitor tasks offsets and > > > stumbled upon this utility: > > > https://github.com/quantifind/KafkaOffsetMonitor > > > > > > It didn't work for me and what I discovered is that it they apparently > > look > > > for the consumers list and offsets in zookeeper, while Samza stores > those > > > in a kafka topic. > > > I tried to think what could be the down sides of using zookeeper to > store > > > offsets (performance?) but didn't had anything solid that came to mind. > > > > > > I guess you guys had some discussions regarding this in the past, What > > > would be the pros/cons for storing the offsets in a kafka topic instead > > of > > > zookeeper? > > > > > > > > > Thanks, > > > Dotan > > > > > >