[
https://issues.apache.org/jira/browse/BEAM-4430?focusedWorklogId=111470&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111470
]
ASF GitHub Bot logged work on BEAM-4430:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Jun/18 11:19
Start Date: 13/Jun/18 11:19
Worklog Time Spent: 10m
Work Description: szewi commented on a change in pull request #465:
[BEAM-4430] Improve Performance Testing Documentation
URL: https://github.com/apache/beam-site/pull/465#discussion_r195042613
##########
File path: src/documentation/io/testing.md
##########
@@ -220,31 +229,207 @@ Parameter descriptions:
<td>Runner to be used for running the test. Currently possible options
are: direct, dataflow.
</td>
</tr>
+ <tr>
+ <td>-DbeamExtraProperties
+ </td>
+ <td>Any other "extra properties" to be passed to Gradle, eg.
"'[filesystem=hdfs]'".
+ </td>
+ </tr>
</tbody>
</table>
-
-
#### Without PerfKit Benchmarker {#without-perfkit-benchmarker}
-If you're using Kubernetes, make sure you can connect to your cluster locally
using kubectl. Otherwise, skip to step 3 below.
+If you're using Kubernetes scripts to host data stores, make sure you can
connect to your cluster locally using kubectl. If you have your own data stores
already setup, you just need to execute step 3 from below list.
1. Set up the data store corresponding to the test you wish to run. You can
find Kubernetes scripts for all currently supported data stores in
[.test-infra/kubernetes](https://github.com/apache/beam/tree/master/.test-infra/kubernetes).
1. In some cases, there is a setup script (*.sh). In other cases, you can
just run ``kubectl create -f [scriptname]`` to create the data store.
1. Convention dictates there will be:
- 1. A core yml script for the data store itself, plus a `NodePort`
service. The `NodePort` service opens a port to the data store for anyone who
connects to the Kubernetes cluster's machines.
- 1. A separate script, called for-local-dev, which sets up a
LoadBalancer service.
+ 1. A yml script for the data store itself, plus a `NodePort` service.
The `NodePort` service opens a port to the data store for anyone who connects
to the Kubernetes cluster's machines from within same subnetwork. Such scripts
are typically useful when running the scripts on Minikube Kubernetes Engine.
+ 1. A separate script, with LoadBalancer service. Such service will
expose an _external ip_ for the datastore. Such scripts are needed when
external access is required (eg. on Jenkins).
1. Examples:
1. For JDBC, you can set up Postgres: `kubectl create -f
.test-infra/kubernetes/postgres/postgres.yml`
1. For Elasticsearch, you can run the setup script: `bash
.test-infra/kubernetes/elasticsearch/setup.sh`
1. Determine the IP address of the service:
1. NodePort service: `kubectl get pods -l 'component=elasticsearch' -o
jsonpath={.items[0].status.podIP}`
1. LoadBalancer service:` kubectl get svc elasticsearch-external -o
jsonpath='{.status.loadBalancer.ingress[0].ip}'`
-1. Run the test using the instructions in the class (e.g. see the
instructions in JdbcIOIT.java)
+1. Run the test using `integrationTest` gradle task and the instructions in
the test class (e.g. see the instructions in JdbcIOIT.java).
1. Tell Kubernetes to delete the resources specified in the Kubernetes
scripts:
1. JDBC: `kubectl delete -f .test-infra/kubernetes/postgres/postgres.yml`
1. Elasticsearch: `bash .test-infra/kubernetes/elasticsearch/teardown.sh`
+##### integrationTest Task {#integration-test-task}
+
+Since `performanceTest` task involved running PerfkitBenchmarker, we can't use
it to run the tests manually. For such purposes a more "low-level" task called
`integrationTest` was introduced.
+
+
+Example usage on Cloud Dataflow runner:
+
+```
+./gradlew integrationTest -p sdks/java/io/hadoop-input-format
-DintegrationTestPipelineOptions='["--project=GOOGLE_CLOUD_PROJECT",
"--tempRoot=GOOGLE_STORAGE_BUCKET", "--numberOfRecords=1000",
"--postgresPort=5432", "--postgresServerName=SERVER_NAME",
"--postgresUsername=postgres", "--postgresPassword=PASSWORD",
"--postgresDatabaseName=postgres", "--postgresSsl=false",
"--runner=TestDataflowRunner"]' -DintegrationTestRunner=dataflow
--tests=org.apache.beam.sdk.io.hadoop.inputformat.HadoopInputFormatIOIT
+```
+
+Example usage on HDFS filesystem and Direct runner:
Review comment:
This will only work when /etc/hosts file will contain entries with hadoop
namenode and hadoop datanodes external IPs, otherwise user will get
`java.nio.channels.UnresolvedAddressException` It's worthy mentioning, however
this info is already in comment section of yml files. I will suggest at least
adding:
`Example usage on HDFS filesystem and Direct runner (with /etc/hosts entries
added):`
make people aware of what need to be done before running this with
DirectRunner.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 111470)
Time Spent: 20m (was: 10m)
> Improve Performance Testing Documentation
> -----------------------------------------
>
> Key: BEAM-4430
> URL: https://issues.apache.org/jira/browse/BEAM-4430
> Project: Beam
> Issue Type: Wish
> Components: testing
> Reporter: Łukasz Gajowy
> Assignee: Łukasz Gajowy
> Priority: Critical
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Currently, the only documentation regarding IO Performance Testing can be
> found here:
> [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests].
> This is certainly not enough given that the performance testing framework
> currently allows to run tests:
> - on local or hdfs filesystems
> - on direct or dataflow runners
> - manually using integrationTest task
> - automatically using performanceTest task
> - using pkb.py tool directly (PerfKitBenchmarker)
> - on demand from pending Pull Requests
> - detecting anomalies
> - gathering results in dashboards
> All the above bullets (and maybe others - to be investigated) need more
> explanation in the docs to make the Performance Testing Framework usable by
> the broader community.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)