[ https://issues.apache.org/jira/browse/BEAM-8314?focusedWorklogId=319894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319894 ]
ASF GitHub Bot logged work on BEAM-8314: ---------------------------------------- Author: ASF GitHub Bot Created on: 27/Sep/19 23:39 Start Date: 27/Sep/19 23:39 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9679: [BEAM-8314] Add aggregation logic to beam_fn_api metric counter updat… URL: https://github.com/apache/beam/pull/9679#discussion_r329282407 ########## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java ########## @@ -1900,16 +1903,47 @@ private void sendWorkerUpdatesToDataflowService( // WorkItem. if (item.getCumulative()) { item.setCumulative(false); + // Group counterUpdates by counterUpdateKey so they can be aggregated before sending to + // dataflow service. + fnApiCounters + .computeIfAbsent(getCounterUpdateKey(item), k -> new ArrayList<>()) + .add(item); } else { // In current world all counters coming from FnAPI are cumulative. // This is a safety check in case new counter type appears in FnAPI. throw new UnsupportedOperationException( "FnApi counters are expected to provide cumulative values." - + " Please, update convertion to delta logic" + + " Please, update conversion to delta logic" + " if non-cumulative counter type is required."); } + } - counterUpdates.add(item); + // Aggregates counterUpdates with same counterUpdateKey to single CounterUpdate if possible + // so we can avoid excessive I/Os for reporting to dataflow service. + List<CounterUpdateAggregator> availableAggregators = + CounterUpdateAggregator.getAllAvailableCounterUpdateAggregators(); + for (List<CounterUpdate> counterUpdateList : fnApiCounters.values()) { + if (counterUpdateList.size() == 0) { + continue; + } + CounterUpdate head = counterUpdateList.get(0); + boolean aggregatable = false; + for (CounterUpdateAggregator aggregator : availableAggregators) { + if (aggregator.isCorrespondingCounterUpdate(head)) { + counterUpdates.add(aggregator.aggregate(counterUpdateList)); + aggregatable = true; + break; + } + } + if (!aggregatable) { + LOG.debug( Review comment: It would also be worth to limit period when we report this log to avoid spam. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 319894) Time Spent: 1h 20m (was: 1h 10m) > Beam Fn Api metrics piling causes pipeline to stuck after running for a while > ----------------------------------------------------------------------------- > > Key: BEAM-8314 > URL: https://issues.apache.org/jira/browse/BEAM-8314 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Yichi Zhang > Priority: Blocker > Fix For: 2.16.0 > > Attachments: E4UaSUhJJKF.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Seems that in StreamingDataflowWorker we are not able to update the metrics > fast enough to dataflow service, the piling metrics causes memory usage to > increase and eventually leads to excessive memory thrashing/GC. And it will > almost stop the pipeline from processing new items. > > !E4UaSUhJJKF.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)