[
https://issues.apache.org/jira/browse/FLINK-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fabian Hueske updated FLINK-456:
--------------------------------
Priority: Major (was: Minor)
> Optional runtime statistics collection
> --------------------------------------
>
> Key: FLINK-456
> URL: https://issues.apache.org/jira/browse/FLINK-456
> Project: Flink
> Issue Type: New Feature
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> The engine should collect job execution statistics (e.g., via accumulators)
> such as:
> - total number of input / output records per operator
> - histogram of input/output ratio of UDF calls
> - histogram of number of input records per reduce / cogroup UDF call
> - histogram of number of output records per UDF call
> - histogram of time spend in UDF calls
> - number of local and remote bytes read (not via accumulators)
> - ...
> These stats should be made available to the user after execution (via
> webfrontend). The purpose of this feature is to ease performance debugging of
> parallel jobs (e.g., to detect data skew).
> It should be possible to deactivate (or activate) the gathering of these
> statistics.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/456
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, runtime, user satisfaction,
> Created at: Tue Feb 04 20:32:49 CET 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)