[
https://issues.apache.org/jira/browse/BEAM-6291?focusedWorklogId=192248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-192248
]
ASF GitHub Bot logged work on BEAM-6291:
----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Jan/19 12:50
Start Date: 30/Jan/19 12:50
Worklog Time Spent: 10m
Work Description: kkucharc commented on pull request #7614: [BEAM-6291]
Generic BigQuery schema load tests metrics
URL: https://github.com/apache/beam/pull/7614#discussion_r252242062
##########
File path:
sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##########
@@ -36,49 +47,79 @@
SchemaField = None
NotFound = None
-RUNTIME_LABEL = 'runtime'
-SUBMIT_TIMESTAMP_LABEL = 'submit_timestamp'
-
-
-def _get_schema_field(schema_field):
+RUNTIME_METRIC = 'runtime'
+COUNTER_LABEL = "total_bytes_count"
+
+ID_LABEL = 'test_id'
+SUBMIT_TIMESTAMP_LABEL = 'timestamp'
+METRICS_TYPE_LABEL = 'metric'
+VALUE_LABEL = 'value'
+
+SCHEMA = [
+ {'name': ID_LABEL,
+ 'type': 'STRING',
+ 'mode': 'REQUIRED'
+ },
+ {'name': SUBMIT_TIMESTAMP_LABEL,
+ 'type': 'TIMESTAMP',
+ 'mode': 'REQUIRED'
+ },
+ {'name': METRICS_TYPE_LABEL,
+ 'type': 'STRING',
+ 'mode': 'REQUIRED'
+ },
+ {'name': VALUE_LABEL,
+ 'type': 'FLOAT',
+ 'mode': 'REQUIRED'
+ }
+]
+
+
+def get_schema_field(schema_field):
return SchemaField(
name=schema_field['name'],
field_type=schema_field['type'],
mode=schema_field['mode'])
+def get_element_by_schema(schema_name, insert_list):
+ for element in insert_list:
+ if element['label'] == schema_name:
+ return element['value']
+
+
class BigQueryClient(object):
- def __init__(self, project_name, table, dataset, schema_map):
+ def __init__(self, project_name, table, dataset):
self._namespace = table
self._bq_client = bigquery.Client(project=project_name)
- schema = self._parse_schema(schema_map)
- self._schema_names = self._get_schema_names(schema)
- schema = self._prepare_schema(schema)
+ self._schema_names = self._get_schema_names()
+ schema = self._prepare_schema()
self._get_or_create_table(schema, dataset)
- def match_and_save(self, result_list):
- rows_tuple = tuple(self._match_inserts_by_schema(result_list))
- self._insert_data(rows_tuple)
+ def match_and_save(self, results_lists):
Review comment:
I agree with you in 100%. I had same impression this naming is not so clear.
I will refactor it according to suggestions. Hopefully it will simplify.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 192248)
Time Spent: 1h (was: 50m)
> Make the schema for BQ tables storing metric results more generic (Python)
> --------------------------------------------------------------------------
>
> Key: BEAM-6291
> URL: https://issues.apache.org/jira/browse/BEAM-6291
> Project: Beam
> Issue Type: Sub-task
> Components: testing
> Reporter: Lukasz Gajowy
> Assignee: Kasia Kucharczyk
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Currently, we keep the metrics results in BQ in tables with a schema like
> this:
> timestamp | total_bytes | run_time | (possibly other BQ columns)
> every time we want to add a new column the schema has to be extended. This is
> not convenient given the fact that any load test can have different metrics
> stored. This in turn would cause multiple BQ tables each queried differently.
> We can provide a more generic schema, like so:
> test_id | timestamp | metric | value
> thanks to that, every metric, whatever it's name is, can be saved in the
> table as a separate row. This gives more elasticity in storing metrics and is
> still easy to query and plot.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)