[
https://issues.apache.org/jira/browse/BEAM-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anonymous updated BEAM-8962:
----------------------------
Status: Triage Needed (was: Resolved)
> FlinkMetricContainer causes churn in the JobManager and lets the web frontend
> malfunction
> -----------------------------------------------------------------------------------------
>
> Key: BEAM-8962
> URL: https://issues.apache.org/jira/browse/BEAM-8962
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: P2
> Fix For: 2.19.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> The {{FlinkMetricContainer}} wraps the Beam metric container for reporting
> metrics, but also stores them as Flink accumulators. With high parallelism
> jobs with over a thousand tasks and many built-in Beam metrics for every Beam
> step, this can accumulate to over 100MB of serialized data which is stored in
> the JobManager's ExecutionGraph. This then fails to even sent over the wire,
> due to the akka.framesize limit (10MB by default), and manifests in {{500
> Internal Server Error}}s in the web frontend.
> We need to introduce an option to disable the reporting via accumulators. It
> is mostly useful for batch workloads where you can retrieve the final
> accumulator values at the end of the job. It adds a lot of memory and network
> overhead.
> Perhaps we could even turn off the accumulators for streaming jobs, or
> entirely and make them opt-in.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)