[ https://issues.apache.org/jira/browse/BEAM-4430?focusedWorklogId=111473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111473 ]
ASF GitHub Bot logged work on BEAM-4430: ---------------------------------------- Author: ASF GitHub Bot Created on: 13/Jun/18 11:19 Start Date: 13/Jun/18 11:19 Worklog Time Spent: 10m Work Description: szewi commented on a change in pull request #465: [BEAM-4430] Improve Performance Testing Documentation URL: https://github.com/apache/beam-site/pull/465#discussion_r195032346 ########## File path: src/documentation/io/testing.md ########## @@ -147,21 +147,30 @@ However, **PerfKit Benchmarker is not required for running integration tests**. Prerequisites: 1. [Install PerfKit Benchmarker](https://github.com/GoogleCloudPlatform/PerfKitBenchmarker) -1. Have a running Kubernetes cluster you can connect to locally using kubectl +1. Have a running Kubernetes cluster you can connect to locally using kubectl. A cluster hosted on Google Kubernetes Engine might be the best fit as it is used to run the tests on Beam's Jenkins. -You won’t need to invoke PerfKit Benchmarker directly. Run `./gradlew performanceTest` in project's root directory, passing appropriate kubernetes scripts depending on the network you're using (local network or remote one). +You won’t need to invoke PerfKit Benchmarker directly. Run `./gradlew performanceTest` task in project's root directory, passing kubernetes scripts of your choice (located in .test_infra/kubernetes directory). It will setup PerfKitBenchmarker for you. -Example run with the direct runner: +Example run with the Direct runner: ``` ./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=1000"]' -DitModule=sdks/java/io/jdbc/ -DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT -DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml" -DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml" -DintegrationTestRunner=direct ``` Example run with the Cloud Dataflow runner: ``` -/gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=1000", "--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' -DitModule=sdks/java/io/jdbc/ -DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT -DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml" -DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml" -DintegrationTestRunner=dataflow +./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=1000", "--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' -DitModule=sdks/java/io/jdbc/ -DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT -DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml" -DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml" -DintegrationTestRunner=dataflow ``` +Example run with the HDFS filesystem and Cloud Dataflow runner: + +HDFS clusters require `export HADOOP_USER_NAME=root` to be set before runnning `performanceTest` task. + +``` +export HADOOP_USER_NAME=root Review comment: Please see comment above. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 111473) Time Spent: 50m (was: 40m) > Improve Performance Testing Documentation > ----------------------------------------- > > Key: BEAM-4430 > URL: https://issues.apache.org/jira/browse/BEAM-4430 > Project: Beam > Issue Type: Wish > Components: testing > Reporter: Łukasz Gajowy > Assignee: Łukasz Gajowy > Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Currently, the only documentation regarding IO Performance Testing can be > found here: > [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests]. > This is certainly not enough given that the performance testing framework > currently allows to run tests: > - on local or hdfs filesystems > - on direct or dataflow runners > - manually using integrationTest task > - automatically using performanceTest task > - using pkb.py tool directly (PerfKitBenchmarker) > - on demand from pending Pull Requests > - detecting anomalies > - gathering results in dashboards > All the above bullets (and maybe others - to be investigated) need more > explanation in the docs to make the Performance Testing Framework usable by > the broader community. -- This message was sent by Atlassian JIRA (v7.6.3#76005)