[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

GitBox Wed, 27 May 2020 10:34:25 -0700


tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r431314045




##########
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json
##########
@@ -0,0 +1,297 @@
+{

Review comment:
       Good catch. 
   
   Re: 
   
   > Apart from that, we can also consider another question: is having those 
.json files version-controlled profitable? Diffs are often very large, so the 
standard review process does not apply here.
   
   I agree that those files are hard to review - I was thinking that perhaps we 
can autogenerate the final configs from a much smaller config that just 
captures the minimal necessary information (name of the benchmark to display,  
table/row name in the database that has the datapoints). The idea is that final 
configs are not edited manually. It would make sense for the visualization tool 
to provide a capability to generate the dashboard based on a common template + 
required minimal config. Is that what Scripted Dashboards is? The webpage page 
says this feature is deprecated.

##########
File path: sdks/python/apache_beam/examples/wordcount_it_test.py
##########
@@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts):
     # Register clean up before pipeline execution
     self.addCleanup(delete_files, [test_output + '*'])
 
+    publish_to_bq = bool(
+        test_pipeline.get_option('publish_to_big_query') or False)
+
+    # Start measure time for performance test
+    start_time = time.time()
+
     # Get pipeline options from command argument: --test-pipeline-options,
     # and start pipeline job by calling pipeline main function.
     run_wordcount(
         test_pipeline.get_full_options_as_args(**extra_opts),
-        save_main_session=False)
+        save_main_session=False,
+    )
+
+    end_time = time.time()
+    run_time = end_time - start_time
+
+    if publish_to_bq:
+      self._publish_metrics(test_pipeline, run_time)
+
+  def _publish_metrics(self, pipeline, metric_value):
+    influx_options = InfluxDBMetricsPublisherOptions(
+        pipeline.get_option('influx_measurement'),
+        pipeline.get_option('influx_db_name'),
+        pipeline.get_option('influx_hostname'),
+        os.getenv('INFLUXDB_USER'),
+        os.getenv('INFLUXDB_USER_PASSWORD'),
+    )
+    metric_reader = MetricsReader(
+        project_name=pipeline.get_option('project'),
+        bq_table=pipeline.get_option('metrics_table'),
+        bq_dataset=pipeline.get_option('metrics_dataset'),
+        publish_to_bq=True,
+        influxdb_options=influx_options,
+    )
+
+    metric_reader.publish_values((
+        metric_value,

Review comment:
       I think the metric name should be 'Runtime', or something like that, not 
the test suite name.
   
   Also, it may be more common to pass (Key,Value) pairs instead of (Value, 
Key) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

Reply via email to