[ 
https://issues.apache.org/jira/browse/BEAM-6627?focusedWorklogId=205038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205038
 ]

ASF GitHub Bot logged work on BEAM-6627:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Feb/19 07:56
            Start Date: 27/Feb/19 07:56
    Worklog Time Spent: 10m 
      Work Description: mwalenia commented on pull request #7772: [BEAM-6627] 
Added Metrics API processing time reporting to TextIOIT
URL: https://github.com/apache/beam/pull/7772#discussion_r260627168
 
 

 ##########
 File path: 
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java
 ##########
 @@ -127,28 +140,49 @@ public void writeThenReadAll() {
 
     PipelineResult result = pipeline.run();
     result.waitUntilFinish();
-    publishGcsResults(result);
+    gatherAndPublishMetrics(result);
   }
 
-  private void publishGcsResults(PipelineResult result) {
+  private void gatherAndPublishMetrics(PipelineResult result) {
+    String uuid = UUID.randomUUID().toString();
+    Timestamp timestamp = Timestamp.now();
+    List<NamedTestResult> namedTestResults = readMetrics(result, uuid, 
timestamp);
+    publishToBigQuery(namedTestResults, bigQueryDataset, bigQueryTable);
+    ConsoleResultPublisher.publish(namedTestResults, uuid, 
timestamp.toString());
+  }
+
+  private List<NamedTestResult> readMetrics(
+      PipelineResult result, String uuid, Timestamp timestamp) {
+    List<NamedTestResult> results = new ArrayList<>();
+
+    MetricsReader reader = new MetricsReader(result, FILEIOIT_NAMESPACE);
+    long writeStartTime = reader.getStartTimeMetric("startTime");
+    long writeEndTime = reader.getEndTimeMetric("middleTime");
+    long readStartTime = reader.getStartTimeMetric("middleTime");
+    long readEndTime = reader.getEndTimeMetric("endTime");
+    double writeTime = (writeEndTime - writeStartTime) / 1000.0;
+    double readTime = (readEndTime - readStartTime) / 1000.0;
+    double copiesPerSec = calculateGcsMetric(result);
+
+    if (copiesPerSec > 0) {
+      results.add(
+          NamedTestResult.create(uuid, timestamp.toString(), "copies_per_sec", 
copiesPerSec));
+    }
+
+    results.add(NamedTestResult.create(uuid, timestamp.toString(), 
"read_time", readTime));
+    results.add(NamedTestResult.create(uuid, timestamp.toString(), 
"write_time", writeTime));
+
+    return results;
+  }
+
+  private double calculateGcsMetric(PipelineResult result) {
 
 Review comment:
   @udim What about cases in which we need some metrics, but not others? Take 
for example the hdfs copies, when we want this metric, other may be irrelevant, 
like gs copies. In that case we'd need to check if each metric has valid values 
and then decide whether to report it (like it's done now with gcs copies - 
number of operations greater than 0 and time not below 0? Calculate and 
report). I believe setting explicit flags for every metric wouldn't be simpler, 
but it would be easier to precisely control what is gathered and reported.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 205038)
    Time Spent: 5h 40m  (was: 5.5h)

> Use Metrics API in IO performance tests
> ---------------------------------------
>
>                 Key: BEAM-6627
>                 URL: https://issues.apache.org/jira/browse/BEAM-6627
>             Project: Beam
>          Issue Type: Improvement
>          Components: testing
>            Reporter: Michal Walenia
>            Assignee: Michal Walenia
>            Priority: Minor
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to