[
https://issues.apache.org/jira/browse/BEAM-5355?focusedWorklogId=164319&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-164319
]
ASF GitHub Bot logged work on BEAM-5355:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Nov/18 11:44
Start Date: 09/Nov/18 11:44
Worklog Time Spent: 10m
Work Description: lgajowy commented on a change in pull request #6987:
[BEAM-5355] Prevent creating metrics of the same name multiple times
URL: https://github.com/apache/beam/pull/6987#discussion_r232226908
##########
File path:
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/GroupByKeyLoadTest.java
##########
@@ -83,15 +83,14 @@ private GroupByKeyLoadTest(String[] args) throws
IOException {
void loadTest() throws IOException {
Optional<SyntheticStep> syntheticStep =
createStep(options.getStepOptions());
- PCollection<KV<byte[], byte[]>> input =
- pipeline.apply(SyntheticBoundedIO.readFrom(sourceOptions));
+ PCollection<KV<byte[], byte[]>> input = pipeline
+ .apply(SyntheticBoundedIO.readFrom(sourceOptions))
+ .apply(ParDo.of(new MetricsMonitor(METRICS_NAMESPACE)));
Review comment:
Thanks for the ideas. I considered the ideas before but wasn't sure that it
is necessary to do it this way. It seems so.
I think that for collecting the total pipeline run_time (which is a
distribution metric) it's enough to flatten all the results of all
distributions and get min and max to calculate it. Preferably it should be
placed at the end of the pipeline to have all the processing time captured.
For counting total bytes: it depends on a place where I measure it. It can
be desired to have different sizes at the beginning and at the end of the
pipeline.
It will probably require splitting the `MetricsMonitor` to `TimeMonitor` and
`BytesMonitor`. Time monitor can be applied anywhere in the pipeline (not much
difference because we are looking for max and min time in the whole pipeline).
Separate `BytesMonitor`s will calculate different results depending on the
place in the pipeline they are "attached".
I will change this in some next contributions, now I wanted to show my
thoughts. If you see flaws feel free to protest. :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 164319)
Time Spent: 3h 10m (was: 3h)
> Create GroupByKey load test for Java SDK
> ----------------------------------------
>
> Key: BEAM-5355
> URL: https://issues.apache.org/jira/browse/BEAM-5355
> Project: Beam
> Issue Type: Sub-task
> Components: testing
> Reporter: Lukasz Gajowy
> Assignee: Lukasz Gajowy
> Priority: Minor
> Fix For: Not applicable
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> This is more thoroughly described in this proposal:
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>
> In short: this ticket is about implementing the GroupByKeyLoadIT that uses
> SyntheticStep and Synthetic source to create load on the pipeline.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)