[
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=207054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207054
]
ASF GitHub Bot logged work on BEAM-6627:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Mar/19 09:40
Start Date: 04/Mar/19 09:40
Worklog Time Spent: 10m
Work Description: mwalenia commented on pull request #7772: [BEAM-6627]
Added Metrics API processing time reporting to TextIOIT
URL: https://github.com/apache/beam/pull/7772#discussion_r261982116
##########
File path:
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
##########
@@ -127,28 +140,49 @@ public void writeThenReadAll() {
PipelineResult result = pipeline.run();
result.waitUntilFinish();
- publishGcsResults(result);
+ gatherAndPublishMetrics(result);
}
- private void publishGcsResults(PipelineResult result) {
+ private void gatherAndPublishMetrics(PipelineResult result) {
+ String uuid = UUID.randomUUID().toString();
+ Timestamp timestamp = Timestamp.now();
+ List<NamedTestResult> namedTestResults = readMetrics(result, uuid,
timestamp);
+ publishToBigQuery(namedTestResults, bigQueryDataset, bigQueryTable);
+ ConsoleResultPublisher.publish(namedTestResults, uuid,
timestamp.toString());
+ }
+
+ private List<NamedTestResult> readMetrics(
+ PipelineResult result, String uuid, Timestamp timestamp) {
+ List<NamedTestResult> results = new ArrayList<>();
+
+ MetricsReader reader = new MetricsReader(result, FILEIOIT_NAMESPACE);
+ long writeStartTime = reader.getStartTimeMetric("startTime");
+ long writeEndTime = reader.getEndTimeMetric("middleTime");
+ long readStartTime = reader.getStartTimeMetric("middleTime");
+ long readEndTime = reader.getEndTimeMetric("endTime");
+ double writeTime = (writeEndTime - writeStartTime) / 1000.0;
+ double readTime = (readEndTime - readStartTime) / 1000.0;
+ double copiesPerSec = calculateGcsMetric(result);
+
+ if (copiesPerSec > 0) {
+ results.add(
+ NamedTestResult.create(uuid, timestamp.toString(), "copies_per_sec",
copiesPerSec));
+ }
+
+ results.add(NamedTestResult.create(uuid, timestamp.toString(),
"read_time", readTime));
+ results.add(NamedTestResult.create(uuid, timestamp.toString(),
"write_time", writeTime));
+
+ return results;
+ }
+
+ private double calculateGcsMetric(PipelineResult result) {
Review comment:
@udim A big advantage of using my approach is built-in error reporting and
checking. After passing an argument with a typo we get for example:
`java.lang.IllegalArgumentException: Class interface
org.apache.beam.sdk.testing.TestPipelineOptions missing a property named
'compresionType'. Did you mean 'compressionType'?`
whereas there's no such mechanism when passing a list of strings as
parameter value. In case of a typo we'd have a silent failure, since in order
to switch reporting of different metrics on and off, we'd need to search for
strings in an array.
I think we should stick with explicitly set flags for subsequent metrics.
WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 207054)
Time Spent: 6.5h (was: 6h 20m)
> Use Metrics API in IO performance tests
> ---------------------------------------
>
> Key: BEAM-6627
> URL: https://issues.apache.org/jira/browse/BEAM-6627
> Project: Beam
> Issue Type: Improvement
> Components: testing
> Reporter: Michal Walenia
> Assignee: Michal Walenia
> Priority: Minor
> Time Spent: 6.5h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)