[ 
https://issues.apache.org/jira/browse/BEAM-5355?focusedWorklogId=164419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-164419
 ]

ASF GitHub Bot logged work on BEAM-5355:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Nov/18 17:01
            Start Date: 09/Nov/18 17:01
    Worklog Time Spent: 10m 
      Work Description: swegner commented on a change in pull request #6987: 
[BEAM-5355] Prevent creating metrics of the same name multiple times
URL: https://github.com/apache/beam/pull/6987#discussion_r232324295
 
 

 ##########
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/GroupByKeyLoadTest.java
 ##########
 @@ -83,15 +83,14 @@ private GroupByKeyLoadTest(String[] args) throws 
IOException {
   void loadTest() throws IOException {
     Optional<SyntheticStep> syntheticStep = 
createStep(options.getStepOptions());
 
-    PCollection<KV<byte[], byte[]>> input =
-        pipeline.apply(SyntheticBoundedIO.readFrom(sourceOptions));
+    PCollection<KV<byte[], byte[]>> input = pipeline
+        .apply(SyntheticBoundedIO.readFrom(sourceOptions))
+        .apply(ParDo.of(new MetricsMonitor(METRICS_NAMESPACE)));
 
 Review comment:
   I just took a look at the current 
[MetricsMonitor](https://github.com/apache/beam/blob/8a88e72f293ef7f9be6c872aa0dda681458c7ca5/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/metrics/MetricsMonitor.java#L27)
 to get a better understanding of the intent.
   
   The `runtime` counter will provide stats on the `DoFn` execution start/end, 
but that is only a subset of Pipeline execution. If you are trying to get 
pipeline start/stop time, you'll need to inject some counter calculation at the 
root(s) of the execution graph, and the sink(s). 
   
   If you only care about Dataflow, you could consume some of the existing 
system counters being calculated automatically. Some details available at: 
[Using the Cloud Dataflow Monitoring Interface: Total Execution 
Time](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#total-execution-time).
   
   But, I believe these loadtests are run for many runners, so using 
Dataflow-native counters is likely not an option. In the Portability SDKs, we 
plan to have some counters reported by the SDK to all runners. /cc @ajamato 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 164419)
    Time Spent: 3h 20m  (was: 3h 10m)

> Create GroupByKey load test for Java SDK
> ----------------------------------------
>
>                 Key: BEAM-5355
>                 URL: https://issues.apache.org/jira/browse/BEAM-5355
>             Project: Beam
>          Issue Type: Sub-task
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Assignee: Lukasz Gajowy
>            Priority: Minor
>             Fix For: Not applicable
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is more thoroughly described in this proposal: 
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>  
> In short: this ticket is about implementing the GroupByKeyLoadIT that uses 
> SyntheticStep and Synthetic source to create load on the pipeline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to