[
https://issues.apache.org/jira/browse/BEAM-8314?focusedWorklogId=319761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319761
]
ASF GitHub Bot logged work on BEAM-8314:
----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Sep/19 19:11
Start Date: 27/Sep/19 19:11
Worklog Time Spent: 10m
Work Description: Ardagan commented on pull request #9679: [BEAM-8314]
Add aggregation logic to beam_fn_api metric counter updat…
URL: https://github.com/apache/beam/pull/9679#discussion_r329212108
##########
File path:
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
##########
@@ -1900,16 +1908,58 @@ private void sendWorkerUpdatesToDataflowService(
// WorkItem.
if (item.getCumulative()) {
item.setCumulative(false);
+ // Group counterUpdates by counterUpdateKey so they can be
aggregated before sending to
+ // dataflow service.
+ fnApiCounters
+ .computeIfAbsent(getCounterUpdateKey(item), k -> new
ArrayList<>())
+ .add(item);
} else {
// In current world all counters coming from FnAPI are cumulative.
// This is a safety check in case new counter type appears in
FnAPI.
throw new UnsupportedOperationException(
"FnApi counters are expected to provide cumulative values."
- + " Please, update convertion to delta logic"
+ + " Please, update conversion to delta logic"
+ " if non-cumulative counter type is required.");
}
+ }
- counterUpdates.add(item);
+ // Aggregates counterUpdates with same counterUpdateKey to single
CounterUpdate if possible
+ // so we can avoid excessive I/Os for reporting to dataflow service.
+ for (List<CounterUpdate> counterUpdateList : fnApiCounters.values()) {
+ CounterUpdate head = counterUpdateList.get(0);
+ if (isDistributionCounterUpdate(head)) {
+ if (head.getDistribution() == null) {
Review comment:
These checks are better be done on each value inside
aggregateDistributionCounter or better omitted at this stage.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 319761)
Time Spent: 40m (was: 0.5h)
> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -----------------------------------------------------------------------------
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Yichi Zhang
> Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics
> fast enough to dataflow service, the piling metrics causes memory usage to
> increase and eventually leads to excessive memory thrashing/GC. And it will
> almost stop the pipeline from processing new items.
>
> !E4UaSUhJJKF.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)