[
https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=275515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275515
]
ASF GitHub Bot logged work on BEAM-7484:
----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jul/19 21:00
Start Date: 11/Jul/19 21:00
Worklog Time Spent: 10m
Work Description: udim commented on pull request #8766: [BEAM-7484]
Metrics collection in BigQuery perf tests
URL: https://github.com/apache/beam/pull/8766#discussion_r302695476
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_read_perf_test.py
##########
@@ -126,9 +133,21 @@ def format_record(record):
p.run().wait_until_finish()
def test(self):
+ def extract_values(row):
+ """Extracts value from a row."""
+ yield base64.b64decode(row.values()[0])
+
self.result = (self.pipeline
| 'Read from BigQuery' >> Read(BigQuerySource(
dataset=self.input_dataset, table=self.input_table))
+ | 'Measure bytes' >> ParDo(MeasureBytes(
+ self.metrics_namespace, extract_values))
+ | 'Count messages' >> ParDo(CountMessages(
+ self.metrics_namespace))
+ | 'Measure time: Start' >> ParDo(MeasureTime(
+ self.metrics_namespace))
+ | 'Measure time: End' >> ParDo(MeasureTime(
Review comment:
I don't understand what the difference is between Start and End versions of
MeasureTime. From what I understand, you're measuring throughput, like bytes
per second and messages per second. What time values do you extract from these
metrics? Total running time? Momentary running time?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 275515)
Time Spent: 2h 10m (was: 2h)
> Throughput collection in BigQuery performance tests
> ---------------------------------------------------
>
> Key: BEAM-7484
> URL: https://issues.apache.org/jira/browse/BEAM-7484
> Project: Beam
> Issue Type: New Feature
> Components: testing
> Reporter: Kamil Wasilewski
> Assignee: Kamil Wasilewski
> Priority: Major
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> The goal is to collect bytes/time and messages/time metrics in BQ read and
> write tests in Python SDK.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)