[
https://issues.apache.org/jira/browse/BEAM-6291?focusedWorklogId=190304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-190304
]
ASF GitHub Bot logged work on BEAM-6291:
----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jan/19 02:47
Start Date: 26/Jan/19 02:47
Worklog Time Spent: 10m
Work Description: udim commented on pull request #7614: [BEAM-6291]
Generic BigQuery schema load tests metrics
URL: https://github.com/apache/beam/pull/7614#discussion_r251185281
##########
File path:
sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py
##########
@@ -36,49 +47,79 @@
SchemaField = None
NotFound = None
-RUNTIME_LABEL = 'runtime'
-SUBMIT_TIMESTAMP_LABEL = 'submit_timestamp'
-
-
-def _get_schema_field(schema_field):
+RUNTIME_METRIC = 'runtime'
+COUNTER_LABEL = "total_bytes_count"
+
+ID_LABEL = 'test_id'
+SUBMIT_TIMESTAMP_LABEL = 'timestamp'
+METRICS_TYPE_LABEL = 'metric'
+VALUE_LABEL = 'value'
+
+SCHEMA = [
+ {'name': ID_LABEL,
+ 'type': 'STRING',
+ 'mode': 'REQUIRED'
+ },
+ {'name': SUBMIT_TIMESTAMP_LABEL,
+ 'type': 'TIMESTAMP',
+ 'mode': 'REQUIRED'
+ },
+ {'name': METRICS_TYPE_LABEL,
+ 'type': 'STRING',
+ 'mode': 'REQUIRED'
+ },
+ {'name': VALUE_LABEL,
+ 'type': 'FLOAT',
+ 'mode': 'REQUIRED'
+ }
+]
+
+
+def get_schema_field(schema_field):
return SchemaField(
name=schema_field['name'],
field_type=schema_field['type'],
mode=schema_field['mode'])
+def get_element_by_schema(schema_name, insert_list):
+ for element in insert_list:
+ if element['label'] == schema_name:
+ return element['value']
+
+
class BigQueryClient(object):
- def __init__(self, project_name, table, dataset, schema_map):
+ def __init__(self, project_name, table, dataset):
self._namespace = table
self._bq_client = bigquery.Client(project=project_name)
- schema = self._parse_schema(schema_map)
- self._schema_names = self._get_schema_names(schema)
- schema = self._prepare_schema(schema)
+ self._schema_names = self._get_schema_names()
+ schema = self._prepare_schema()
self._get_or_create_table(schema, dataset)
- def match_and_save(self, result_list):
- rows_tuple = tuple(self._match_inserts_by_schema(result_list))
- self._insert_data(rows_tuple)
+ def match_and_save(self, results_lists):
Review comment:
Could you document what type `results_list` is?
It seems that each item in `results_list` is a list of dictionaries, and
each dict looks like:
`{'label': SUBMIT_TIMESTAMP_LABEL, 'value': time.time()}`
but I'm not 100% sure.
I think this module would be easier to understand if each item in
`results_list` was a single dict:
```py
{
ID_LABEL: uuid,
SUBMIT_TIMESTAMP_LABEL: time.time(),
METRICS_TYPE_LABEL: RUNTIME_METRIC,
VALUE_LABEL: value,
}
```
Note that `_bq_client.insert_rows()` also accepts a list of dicts so there
would be no need to convert the above to tuple form.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 190304)
Time Spent: 0.5h (was: 20m)
> Make the schema for BQ tables storing metric results more generic (Python)
> --------------------------------------------------------------------------
>
> Key: BEAM-6291
> URL: https://issues.apache.org/jira/browse/BEAM-6291
> Project: Beam
> Issue Type: Sub-task
> Components: testing
> Reporter: Lukasz Gajowy
> Assignee: Kasia Kucharczyk
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, we keep the metrics results in BQ in tables with a schema like
> this:
> timestamp | total_bytes | run_time | (possibly other BQ columns)
> every time we want to add a new column the schema has to be extended. This is
> not convenient given the fact that any load test can have different metrics
> stored. This in turn would cause multiple BQ tables each queried differently.
> We can provide a more generic schema, like so:
> test_id | timestamp | metric | value
> thanks to that, every metric, whatever it's name is, can be saved in the
> table as a separate row. This gives more elasticity in storing metrics and is
> still easy to query and plot.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)