[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests

ASF GitHub Bot (JIRA) Thu, 11 Jul 2019 14:01:32 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=275515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275515
 ]


ASF GitHub Bot logged work on BEAM-7484:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jul/19 21:00
            Start Date: 11/Jul/19 21:00
    Worklog Time Spent: 10m 
      Work Description: udim commented on pull request #8766: [BEAM-7484] 
Metrics collection in BigQuery perf tests
URL: https://github.com/apache/beam/pull/8766#discussion_r302695476
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py
 ##########
 @@ -126,9 +133,21 @@ def format_record(record):
     p.run().wait_until_finish()
 
   def test(self):
+    def extract_values(row):
+      """Extracts value from a row."""
+      yield base64.b64decode(row.values()[0])
+
     self.result = (self.pipeline
                    | 'Read from BigQuery' >> Read(BigQuerySource(
                        dataset=self.input_dataset, table=self.input_table))
+                   | 'Measure bytes' >> ParDo(MeasureBytes(
+                       self.metrics_namespace, extract_values))
+                   | 'Count messages' >> ParDo(CountMessages(
+                       self.metrics_namespace))
+                   | 'Measure time: Start' >> ParDo(MeasureTime(
+                       self.metrics_namespace))
+                   | 'Measure time: End' >> ParDo(MeasureTime(
 
 Review comment:
   I don't understand what the difference is between Start and End versions of 
MeasureTime. From what I understand,  you're measuring throughput, like bytes 
per second and messages per second. What time values do you extract from these 
metrics? Total running time? Momentary running time?
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 275515)
    Time Spent: 2h 10m  (was: 2h)

> Throughput collection in BigQuery performance tests
> ---------------------------------------------------
>
>                 Key: BEAM-7484
>                 URL: https://issues.apache.org/jira/browse/BEAM-7484
>             Project: Beam
>          Issue Type: New Feature
>          Components: testing
>            Reporter: Kamil Wasilewski
>            Assignee: Kamil Wasilewski
>            Priority: Major
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The goal is to collect bytes/time and messages/time metrics in BQ read and 
> write tests in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests

Reply via email to