[ 
https://issues.apache.org/jira/browse/BEAM-4430?focusedWorklogId=111473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111473
 ]

ASF GitHub Bot logged work on BEAM-4430:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Jun/18 11:19
            Start Date: 13/Jun/18 11:19
    Worklog Time Spent: 10m 
      Work Description: szewi commented on a change in pull request #465: 
[BEAM-4430] Improve Performance Testing Documentation
URL: https://github.com/apache/beam-site/pull/465#discussion_r195032346
 
 

 ##########
 File path: src/documentation/io/testing.md
 ##########
 @@ -147,21 +147,30 @@ However, **PerfKit Benchmarker is not required for 
running integration tests**.
 
 Prerequisites:
 1.  [Install PerfKit 
Benchmarker](https://github.com/GoogleCloudPlatform/PerfKitBenchmarker)
-1.  Have a running Kubernetes cluster you can connect to locally using kubectl
+1.  Have a running Kubernetes cluster you can connect to locally using 
kubectl. A cluster hosted on Google Kubernetes Engine might be the best fit as 
it is used to run the tests on Beam's Jenkins.
 
-You won’t need to invoke PerfKit Benchmarker directly. Run `./gradlew 
performanceTest` in project's root directory, passing appropriate kubernetes 
scripts depending on the network you're using (local network or remote one).
+You won’t need to invoke PerfKit Benchmarker directly. Run `./gradlew 
performanceTest` task in project's root directory, passing kubernetes scripts 
of your choice (located in .test_infra/kubernetes directory). It will setup 
PerfKitBenchmarker for you.  
 
-Example run with the direct runner:
+Example run with the Direct runner:
 ```
 ./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=1000"]' 
-DitModule=sdks/java/io/jdbc/ 
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT 
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
 
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
 -DintegrationTestRunner=direct
 ```
 
 
 Example run with the Cloud Dataflow runner:
 ```
-/gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=1000", 
"--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' 
-DitModule=sdks/java/io/jdbc/ 
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT 
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
 
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
 -DintegrationTestRunner=dataflow
+./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=1000", 
"--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' 
-DitModule=sdks/java/io/jdbc/ 
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT 
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
 
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
 -DintegrationTestRunner=dataflow
 ```
 
+Example run with the HDFS filesystem and Cloud Dataflow runner:
+
+HDFS clusters require `export HADOOP_USER_NAME=root` to be set before runnning 
`performanceTest` task.
+
+```
+export HADOOP_USER_NAME=root
 
 Review comment:
   Please see comment above.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 111473)
    Time Spent: 50m  (was: 40m)

> Improve Performance Testing Documentation
> -----------------------------------------
>
>                 Key: BEAM-4430
>                 URL: https://issues.apache.org/jira/browse/BEAM-4430
>             Project: Beam
>          Issue Type: Wish
>          Components: testing
>            Reporter: Łukasz Gajowy
>            Assignee: Łukasz Gajowy
>            Priority: Critical
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, the only documentation regarding IO Performance Testing can be 
> found here: 
> [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests].
>  This is certainly not enough given that the performance testing framework 
> currently allows to run tests:
>  - on local or hdfs filesystems
>  - on direct or dataflow runners
>  - manually using integrationTest task
>  - automatically using performanceTest task
>  - using pkb.py tool directly (PerfKitBenchmarker)
>  - on demand from pending Pull Requests 
>  - detecting anomalies
>  - gathering results in dashboards
> All the above bullets (and maybe others - to be investigated) need more 
> explanation in the docs to make the Performance Testing Framework usable by 
> the broader community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to