Hey Dotan,

Samza has the checkpoint-tool.sh script, which can be used to read
checkpoints for a given task. The MetricsSnapshotReporter can also be used
to read metrics from a Samza job to check its offset progress.

I don't believe that there's anything on the OS side that's plug and play,
but you could hook up Samza to a metrics system (Graphite, etc) and do
metrics/monitoring that way.

A simple hack is to use MetricsSnapshotReporter, and then run
kafka-console-consumer.sh to read the JSON blobs, and parse the metrics
that way

Cheers,
Chris

On Thu, Feb 26, 2015 at 11:03 AM, Dotan Patrich <dot...@fortscale.com>
wrote:

> Hi Chris,
>
> Thanks for the info! very helpful!
> Seems very reasonable, by the way, it all started when I was looking for
> some open source monitoring tool for Samza/Kafka to see which tasks are the
> bottleneck in terms of performance. Do you have any experience with such a
> tool (other than the internal solution developed at LinkedIn)?
>  On 26 Feb 2015 20:11, "Chris Riccomini" <criccom...@apache.org> wrote:
>
> > Hey Dotan,
> >
> > The high-level (ZK-based) Kafka consumer (not Samza's) currently uses ZK
> to
> > store offsets. They (Kafka) are moving away from this when they re-write
> > their new NIO-based consumer. They will adopt the strategy of storing
> > offsets in a Kafka topic, just like Samza has for years.
> >
> > The main motivation for not storing offsets in ZK is that it imposes
> > artificial limits on how often you can checkpoint due to ZK scalability.
> > For example, if you wanted to checkpoint your offsets after every
> message,
> > you would hammer away on ZK with thousands of writers per-second, just
> for
> > one consumer. Multiple this out by 100s or 1000s of consumers, and the ZK
> > grid would never be able to keep up. Kafka is actually really good at
> > exactly this kind of workload. In general, using ZK as a KV store is not
> a
> > great idea.
> >
> > The other benefit of storing offsets in Kafka is that it means Samza
> > doesn't directly depend on ZK (just transitively, through Kafka). This
> > should make operating Samza easier.
> >
> > Cheers,
> > Chris
> >
> > On Wed, Feb 25, 2015 at 10:09 PM, Dotan Patrich <dot...@fortscale.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I was looking for a quick and easy way to monitor tasks offsets and
> > > stumbled upon this utility:
> > > https://github.com/quantifind/KafkaOffsetMonitor
> > >
> > > It didn't work for me and what I discovered is that it they apparently
> > look
> > > for the consumers list and offsets in zookeeper, while Samza stores
> those
> > > in a kafka topic.
> > > I tried to think what could be the down sides of using zookeeper to
> store
> > > offsets (performance?) but didn't had anything solid that came to mind.
> > >
> > > I guess you guys had some discussions regarding this in the past, What
> > > would be the pros/cons for storing the offsets in a kafka topic instead
> > of
> > > zookeeper?
> > >
> > >
> > > Thanks,
> > > Dotan
> > >
> >
>

Reply via email to