I don't have full list of metrics, but everything that is related to
runtime performance and possible bottlenecks of the system. All
interprocess communication counters, errors, latencies, checkpoint sizes
and checkpointing latencies. Buffer allocations and releases, etc.
As we aggregate ourselves we can produce multiple views of the same metric:
min, max, tp99, tp99.9, top n, etc.

Could you point to the doc/Jira/diff for your change?


On Thu, Apr 14, 2016 at 12:32 PM, Chesnay Schepler <ches...@apache.org>
wrote:

> I'm currently working on a metric system that
> a) exposes several TaskManger metrics
> b) allows gathering metrics in various parts of a task, most notably
> user-defined functions.
>
> The first version makes these metrics available via JMX on each
> TaskManager.
> While a mechanism to make that pluggable is /planned/ there are no details
> on that yet.
>
> I /guess/ once it is merged you should be able to modify one of the
> classes so that the data is directly
> exported to your tool, but i would have to know more about it to make a
> definite assessment.
>
> There are no plans to funnel all those metrics unaggregated through
> Flink's accumulator mechanism;
> only a selection that will be aggregated locally and on the JobManager to
> display in the Dashboard.
>
> Out of curiosity, what metrics are you interested in?
>
>
> On 14.04.2016 20:59, Maxim wrote:
>
>> Hi!
>> I'm looking into integrating Flink into our stack and one of the
>> requirements is to report metrics to an internal system. The current
>> Accumulators are not adequate to provide visibility that we need to run
>> such a system in production. We want much more information about the
>> internal cluster state and ability to calculate aggregates ourselves. The
>> core reporting API accepts a metric name, metric type (gauge, counter,
>> timer) and a set of key value pairs that act as dimensions.
>>
>> The ideal solution for us would report the metrics through such API and
>> provide default binding to existing Accumulators, but allow overriding it
>> to our internal reporting client.
>>
>> Is it something that could be added to the Flink or there are other plans
>> for monitoring?
>>
>> Thanks!
>>
>> Maxim.
>>
>>
>

Reply via email to