Yes, agree Ben.

More than the collected metrics, my question is more how to "expose"/"push" those metrics.

Imagine, I have a pipeline executed using the Spark runner on a Spark cluster. Now, I change this pipeline to use the dataflow runner on Google Cloud Dataflow service. I have to completely change the way of getting metrics. Optionally, if I'm able to define some like --metric-appender=foo.cfg containing additionally to the execution engine specific layer, a target where I can push the metric (like elasticsearch or kafka), I can implement a simple and generic way of harvesting metrics. It's totally fine to have some metric specific to Spark when using the Spark runner and others specific to Dataflow when using the dataflow runner, my point is more: how I can send the metrics to my global monitoring/reporting layer. Somehow, we can see the metrics like meta side output of the pipeline, send to a target sink.

I don't want to change or bypass the execution engine specific, I mean provide a way for the user to target his system (optional).

Regards
JB

On 02/18/2017 06:15 PM, Ben Chambers wrote:
The question is how much of metrics has to be in the runner and how much
can be shared. So far we share the API the user uses to report metrics -
this is the most important part since it is required for pipelines to be
portable.

The next piece that could be shared is something related to reporting. But,
implementing logically correct metrics requires the execution engine to be
involved, since it depends on how and which bundles are retried. What I'm
not sure about is how much can be shared and/or made available for these
runners vs. how much is tied to the execution engine.

On Sat, Feb 18, 2017, 8:52 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:

For Spark, I fully agree. My point is more when the execution engine or
runner doesn't provide anything or we have to provide a generic way of
harvesting/pushing metrics.

It could at least be a documentation point. Actually, I'm evaluation the
monitoring capabilities of the different runners.

Regards
JB

On 02/18/2017 05:47 PM, Amit Sela wrote:
That's what I don't understand - why would we want that ?
Taking on responsibilities in the "stack" should have a good reason.

Someone choosing to run Beam on Spark/Flink/Apex would have to take care
of
installing those clusters, right ? perhaps providing them a resilient
underlying FS ? and if he wants, setup monitoring (which even with the
API
proposed he'd have to do).

I just don't see why it should be a part of the runner and/or Metrics
API.

On Sat, Feb 18, 2017 at 6:35 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Good point.

In Decanter, it's what I named a "scheduled collector". So, yes, the
adapter will periodically harvest metric to push.

Regards
JB

On 02/18/2017 05:30 PM, Amit Sela wrote:
First issue with "push" metrics plugin - what if the runner's
underlying
reporting mechanism is "pull" ? Codahale ScheduledReporter will sample
the
values every X and send to ...
So any runner using a "pull-like" would use an adapter ?

On Sat, Feb 18, 2017 at 6:27 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Hi Ben,

ok it's what I thought. Thanks for the clarification.

+1 for the plugin-like "push" API (it's what I have in mind too ;)).
I will start a PoC for discussion next week.

Regards
JB

On 02/18/2017 05:17 PM, Ben Chambers wrote:
The runner can already report metrics during pipeline execution so it
is
usable for monitoring.

The pipeline result can be used to query metrics during pipeline
execution,
so a first version of reporting to other systems is to periodically
pulls
metrics from the runner with that API.

We may eventually want to provide a plugin-like API to get the runner
to
push metrics more directly to other metrics stores. This layer needs
some
thought since it has to handle the complexity of attempted/committed
metrics to be consistent with the model.



On Sat, Feb 18, 2017, 5:44 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Hi Amit,

before Beam, I didn't mind about portability ;) So I used the Spark
approach.

But, now, as a Beam user, I would expect a generic way to deal with
metric whatever the runner would be.

Today, you are right: I'm using the solution provided by the
execution
engine. That's the current approach and it works fine. And it's up to
me
to leverage (for intance Accumulators) it with my own system.

My thought is more to provide a generic way. It's only a discussion
for
now ;)

Regards
JB

On 02/18/2017 02:38 PM, Amit Sela wrote:
On Sat, Feb 18, 2017 at 10:16 AM Jean-Baptiste Onofré <
j...@nanthrax.net>
wrote:

Hi Amit,

my point is: how do we provide metric today to end user and how can
they
use it to monitor a running pipeline ?

Clearly the runner is involved, but, it should behave the same way
for
all runners. Let me take an example.
On my ecosystem, I'm using both Flink and Spark with Beam, some
pipelines on each. I would like to get the metrics for all
pipelines
to
my monitoring backend. If I can "poll" from the execution engine
metric
backend to my system that's acceptable, but it's an overhead of
work.
Having a generic metric reporting layer would allow us to have a
more
common way. If the user doesn't provide any reporting sink, then we
use
the execution backend metric layer. If provided, we use the
reporting
sink.

How did you do it before Beam ? I that for Spark you reported it's
native
metrics via Codahale Reporter and Accumulators were visible in the
UI,
and
the Spark runner took it a step forward to make it all visible via
Codahale. Assuming Flink does something similar, it all belongs to
runner
setup/configuration.


About your question: you are right, it's possible to update a
collector
or appender without impacting anything else.

Regards
JB

On 02/17/2017 10:38 PM, Amit Sela wrote:
@JB I think what you're suggesting is that Beam should provide a
"Metrics
Reporting" API as well, and I used to think like you, but the
more I
thought of that the more I tend to disagree now.

The SDK is for users to author pipelines, so Metrics are for
user-defined
metrics (in contrast to runner metrics).

The Runner API is supposed to help different backends to integrate
with
Beam to allow users to execute those pipeline on their favourite
backend. I
believe the Runner API has to provide restrictions/demands that
are
just
enough so a runner could execute a Beam pipeline as best it can,
and
I'm
afraid that this would demand runner authors to do work that is
unnecessary.
This is also sort of "crossing the line" into the runner's domain
and
"telling it how to do" instead of what, and I don't think we want
that.

I do believe however that runner's should integrate the Metrics
into
their
own metrics reporting system - but that's for the runner author to
decide.
Stas did this for the Spark runner because Spark doesn't report
back
user-defined Accumulators (Spark's Aggregators) to it's Metrics
system.

On a curious note though, did you use an OSGi service per
event-type ?
so
you can upgrade specific event-handlers without taking down the
entire
reporter ? but that's really unrelated to this thread :-) .



On Fri, Feb 17, 2017 at 8:36 PM Ben Chambers
<bchamb...@google.com.invalid>
wrote:

It don't think it is possible for there to be a general mechanism
for
pushing metrics out during the execution of a pipeline. The
Metrics
API
suggests that metrics should be reported as values across all
attempts
and
values across only successful attempts. The latter requires
runner
involvement to ensure that a given metric value is atomically
incremented
(or checkpointed) when the bundle it was reported in is
committed.

Aviem has already implemented Metrics support for the Spark
runner. I
am
working on support for the Dataflow runner.

On Fri, Feb 17, 2017 at 7:50 AM Jean-Baptiste Onofré <
j...@nanthrax.net>
wrote:

Hi guys,

As I'm back from vacation, I'm back on this topic ;)

It's a great discussion, and I think about the Metric IO
coverage,
it's
good.

However, there's a point that we discussed very fast in the
thread
and
I
think it's an important one (maybe more important than the
provided
metrics actually in term of roadmap ;)).

Assuming we have pipelines, PTransforms, IOs, ... using the
Metric
API,
how do we expose the metrics for the end-users ?

A first approach would be to bind a JMX MBean server by the
pipeline
and
expose the metrics via MBeans. I don't think it's a good idea for
the
following reasons:
1. It's not easy to know where the pipeline is actually executed,
and
so, not easy to find the MBean server URI.
2. For the same reason, we can have port binding error.
3. If it could work for unbounded/streaming pipelines (as they
are
always "running"), it's not really applicable for bounded/batch
pipelines as their lifetime is "limited" ;)

So, I think the "push" approach is better: during the execution,
a
pipeline "internally" collects and pushes the metric to a
backend.
The "push" could a kind of sink. For instance, the metric
"records"
can
be sent to a Kafka topic, or directly to Elasticsearch or
whatever.
The metric backend can deal with alerting, reporting, etc.

Basically, we have to define two things:
1. The "appender" where the metrics have to be sent (and the
corresponding configuration to connect, like Kafka or
Elasticsearch
location)
2. The format of the metric data (for instance, json format).

In Apache Karaf, I created something similar named Decanter:







http://blog.nanthrax.net/2015/07/monitoring-and-alerting-with-apache-karaf-decanter/

http://karaf.apache.org/manual/decanter/latest-1/

Decanter provides collectors that harvest the metrics (like JMX
MBean
attributes, log messages, ...). Basically, for Beam, it would be
directly the Metric API used by pipeline parts.
Then, the metric record are send to a dispatcher which send the
metric
records to an appender. The appenders store or send the metric
records
to a backend (elasticsearc, cassandra, kafka, jms, reddis, ...).

I think it would make sense to provide the configuration and
Metric
"appender" via the pipeline options.
As it's not really runner specific, it could be part of the
metric
API
(or SPI in that case).

WDYT ?

Regards
JB

On 02/15/2017 09:22 AM, Stas Levin wrote:
+1 to making the IO metrics (e.g. producers, consumers)
available
as
part
of the Beam pipeline metrics tree for debugging and visibility.

As it has already been mentioned, many IO clients have a metrics
mechanism
in place, so in these cases I think it could be beneficial to
mirror
their
metrics under the relevant subtree of the Beam metrics tree.

On Wed, Feb 15, 2017 at 12:04 AM Amit Sela <
amitsel...@gmail.com>
wrote:

I think this is a great discussion and I'd like to relate to
some
of
the
points raised here, and raise some of my own.

First of all I think we should be careful here not to cross
boundaries.
IOs
naturally have many metrics, and Beam should avoid "taking
over"
those.
IO
metrics should focus on what's relevant to the Pipeline:
input/output
rate,
backlog (for UnboundedSources, which exists in bytes but for
monitoring
purposes we might want to consider #messages).

I don't agree that we should not invest in doing this in
Sources/Sinks
and
going directly to SplittableDoFn because the IO API is familiar
and
known,
and as long as we keep it should be treated as a first class
citizen.

As for enable/disable - if IOs consider focusing on
pipeline-related
metrics I think we should be fine, though this could also
change
between
runners as well.

Finally, considering "split-metrics" is interesting because on
one
hand
it
affects the pipeline directly (unbalanced partitions in Kafka
that
may
cause backlog) but this is that fine-line of responsibilities
(Kafka
monitoring would probably be able to tell you that partitions
are
not
balanced).

My 2 cents, cheers!

On Tue, Feb 14, 2017 at 8:46 PM Raghu Angadi
<rang...@google.com.invalid

wrote:

On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers
<bchamb...@google.com.invalid

wrote:


* I also think there are data source specific metrics that a
given
IO
will
want to expose (ie, things like kafka backlog for a topic.)


UnboundedSource has API for backlog. It is better for
beam/runners
to
handle backlog as well.
Of course there will be some source specific metrics too
(errors,
i/o
ops
etc).




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to