[
https://issues.apache.org/jira/browse/BEAM-8314?focusedWorklogId=319876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319876
]
ASF GitHub Bot logged work on BEAM-8314:
----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Sep/19 22:46
Start Date: 27/Sep/19 22:46
Worklog Time Spent: 10m
Work Description: angoenka commented on pull request #9679: [BEAM-8314]
Add aggregation logic to beam_fn_api metric counter updat…
URL: https://github.com/apache/beam/pull/9679#discussion_r329274916
##########
File path:
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/CounterUpdateAggregator.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker;
+
+import com.google.api.services.dataflow.model.CounterUpdate;
+import java.util.Arrays;
+import java.util.List;
+import
org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Kind;
+
+/**
+ * CounterUpdateAggregator performs aggregation over a list of CounterUpdate
and return combined
+ * result.
+ */
+interface CounterUpdateAggregator {
+
+ /**
+ * Implementation of aggregate function should provide logic to take the
list of CounterUpdates
+ * and return single combined CounterUpdate object. Reporting the aggregated
result to Dataflow
+ * should have same effect as reporting the elements in list individually to
Dataflow.
+ *
+ * @param counterUpdates CounterUpdates to aggregate.
+ * @return Aggregated CounterUpdate.
+ */
+ CounterUpdate aggregate(List<CounterUpdate> counterUpdates);
+
+ /**
+ * CounterUpdate {@link
+ *
org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Kind
kind}
+ */
+ Kind getKind();
+
+ /**
+ * Check whether the aggregator is able to perform aggregation on the kind
of CounterUpdate.
+ *
+ * @param counterUpdate the counterUpdate object to check.
+ * @return true if the aggregator can perform aggregation over these type of
CounterUpdate.
+ */
+ default boolean isCorrespondingCounterUpdate(CounterUpdate counterUpdate) {
+ return (counterUpdate.getStructuredNameAndMetadata() != null
+ && counterUpdate.getStructuredNameAndMetadata().getMetadata() !=
null
+ && getKind()
+ .toString()
+
.equals(counterUpdate.getStructuredNameAndMetadata().getMetadata().getKind()))
+ || (counterUpdate.getNameAndKind() != null
+ &&
getKind().toString().equals(counterUpdate.getNameAndKind().getKind()));
+ }
+
+ static List<CounterUpdateAggregator>
getAllAvailableCounterUpdateAggregators() {
Review comment:
We don't need to expose this if we have a single method.
Infact we can just have single class `CounterUpdateAggregators` which can
expose that method and remove the logic from interface.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 319876)
Time Spent: 1h (was: 50m)
> Beam Fn Api metrics piling causes pipeline to stuck after running for a while
> -----------------------------------------------------------------------------
>
> Key: BEAM-8314
> URL: https://issues.apache.org/jira/browse/BEAM-8314
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Yichi Zhang
> Priority: Blocker
> Fix For: 2.16.0
>
> Attachments: E4UaSUhJJKF.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Seems that in StreamingDataflowWorker we are not able to update the metrics
> fast enough to dataflow service, the piling metrics causes memory usage to
> increase and eventually leads to excessive memory thrashing/GC. And it will
> almost stop the pipeline from processing new items.
>
> !E4UaSUhJJKF.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)