This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 2a8ec170e4e498468c85e0a4848a43b09d66e3ce Author: Kenneth Knowles <[email protected]> AuthorDate: Wed Apr 11 11:58:59 2018 -0700 Update Nexmark launch instructions for Gradle --- src/documentation/sdks/nexmark.md | 270 ++++++++++++++++++++++++-------------- 1 file changed, 175 insertions(+), 95 deletions(-) diff --git a/src/documentation/sdks/nexmark.md b/src/documentation/sdks/nexmark.md index 5e86cd7..d73943a 100644 --- a/src/documentation/sdks/nexmark.md +++ b/src/documentation/sdks/nexmark.md @@ -9,12 +9,14 @@ permalink: /documentation/sdks/java/nexmark/ ## What it is Nexmark is a suite of pipelines inspired by the 'continuous data stream' -queries in [Nexmark research paper](http://datalab.cs.pdx.edu/niagaraST/NEXMark/) +queries in [Nexmark research +paper](http://datalab.cs.pdx.edu/niagaraST/NEXMark/) -These are multiple queries over a three entities model representing on online auction system: +These are multiple queries over a three entities model representing on online +auction system: - - **Person** represents a person submitting an item for auction and/or making a bid - on an auction. + - **Person** represents a person submitting an item for auction and/or making + a bid on an auction. - **Auction** represents an item under auction. - **Bid** represents a bid for an item under auction. @@ -62,11 +64,14 @@ We have augmented the original queries with five more: queries. ## Benchmark workload configuration -Here are some of the knobs of the benchmark workload (see [NexmarkConfiguration.java](https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java)). + +Here are some of the knobs of the benchmark workload (see +[NexmarkConfiguration.java](https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkConfiguration.java)). These configuration items can be passed to the launch command line. ### Events generation (defaults) + * 100 000 events generated * 100 generator threads * Event rate in SIN curve @@ -76,21 +81,26 @@ These configuration items can be passed to the launch command line. * 1000 concurrent persons bidding / creating auctions ### Windows (defaults) + * size 10s * sliding period 5s * watermark hold for 0s ### Events Proportions (defaults) + * Hot Auctions = ½ * Hot Bidders =¼ * Hot Sellers=¼ ### Technical + * Artificial CPU load * Artificial IO load ## Nexmark output -Here is an example output of the Nexmark benchmark run in streaming mode with the SMOKE suite on the (local) direct runner: + +Here is an example output of the Nexmark benchmark run in streaming mode with +the SMOKE suite on the (local) direct runner: <pre> Performance: @@ -112,17 +122,19 @@ Performance: ## Benchmark launch configuration -We can specify the Beam runner to use with maven profiles, available profiles are: - - direct-runner - spark-runner - flink-runner - apex-runner +The Nexmark launcher accepts the `--runner` argument as usual for programs that +use Beam PipelineOptions to manage their command line arguments. In addition +to this, the necessary dependencies must be configured. -The runner must also be specified like in any other Beam pipeline using: +When running via Gradle, the following two parameters control the execution: - --runner + -P nexmark.args + The command line to pass to the Nexmark main program. + -P nexmark.runner + The Gradle project name of the runner, such as ":beam-runners-direct-java" or + ":beam-runners-flink. The project names can be found in the root + `settings.gradle`. Test data is deterministically synthesized on demand. The test data may be synthesized in the same pipeline as the query itself, @@ -417,124 +429,192 @@ Yet to come ### Running SMOKE suite on the DirectRunner (local) +The DirectRunner is default, so it is not required to pass `-Pnexmark.runner`. +Here we do it for maximum clarity. + +The direct runner does not have separate batch and streaming modes, but the +Nexmark launch does. + +These parameters leave on many of the DirectRunner's extra safety checks so the +SMOKE suite can make sure there is nothing broken in the Nexmark suite. + Batch Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=false --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true" + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-direct-java" \ + -Pnexmark.args=" + --runner=DirectRunner + --streaming=false + --suite=SMOKE + --manageResources=false + --monitorJobs=true + --enforceEncodability=true + --enforceImmutability=true" Streaming Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner --suite=SMOKE --streaming=true --manageResources=false --monitorJobs=true --enforceEncodability=true --enforceImmutability=true" - + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-direct-java" \ + -Pnexmark.args=" + --runner=DirectRunner + --streaming=true + --suite=SMOKE + --manageResources=false + --monitorJobs=true + --enforceEncodability=true + --enforceImmutability=true" ### Running SMOKE suite on the SparkRunner (local) +The SparkRunner is special-cased in the Nexmark gradle launch. The task will +provide the version of Spark that the SparkRunner is built against, and +configure logging. + Batch Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true" + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-spark" \ + -Pnexmark.args=" + --runner=SparkRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=false + --manageResources=false + --monitorJobs=true" Streaming Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true" - + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-spark" \ + -Pnexmark.args=" + --runner=SparkRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=true + --manageResources=false + --monitorJobs=true" ### Running SMOKE suite on the FlinkRunner (local) Batch Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true --flinkMaster=local" + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-flink_2.11" \ + -Pnexmark.args=" + --runner=FlinkRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=false + --manageResources=false + --monitorJobs=true + --flinkMaster=local" Streaming Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true --flinkMaster=local" - + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-flink_2.11" \ + -Pnexmark.args=" + --runner=FlinkRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=true + --manageResources=false + --monitorJobs=true + --flinkMaster=local" ### Running SMOKE suite on the ApexRunner (local) Batch Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=false" + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-apex" \ + -Pnexmark.args=" + --runner=ApexRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=false + --manageResources=false + --monitorJobs=true" Streaming Mode: - mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner --suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=false" - + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-apex" \ + -Pnexmark.args=" + --runner=ApexRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=true + --manageResources=false + --monitorJobs=true" ### Running SMOKE suite on Google Cloud Dataflow -Building package: - - mvn clean package -Pdataflow-runner - -Submit to Google Dataflow service: - - -``` -java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar \ - org.apache.beam.sdk.nexmark.Main \ - --runner=DataflowRunner - --project=<your project> \ - --zone=<your zone> \ - --workerMachineType=n1-highmem-8 \ - --stagingLocation=gs://<a gs path for staging> \ - --streaming=true \ - --sourceType=PUBSUB \ - --pubSubMode=PUBLISH_ONLY \ - --pubsubTopic=<an existing Pubsub topic> \ - --resourceNameMode=VERBATIM \ - --manageResources=false \ - --monitorJobs=false \ - --numEventGenerators=64 \ - --numWorkers=16 \ - --maxNumWorkers=16 \ - --suite=SMOKE \ - --firstEventRate=100000 \ - --nextEventRate=100000 \ - --ratePeriodSec=3600 \ - --isRateLimited=true \ - --avgPersonByteSize=500 \ - --avgAuctionByteSize=500 \ - --avgBidByteSize=500 \ - --probDelayedEvent=0.000001 \ - --occasionalDelaySec=3600 \ - --numEvents=0 \ - --useWallclockEventTime=true \ - --usePubsubPublishTime=true \ - --experiments=enable_custom_pubsub_sink -``` - -``` -java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar \ - org.apache.beam.sdk.nexmark.Main \ - --runner=DataflowRunner - --project=<your project> \ - --zone=<your zone> \ - --workerMachineType=n1-highmem-8 \ - --stagingLocation=gs://<a gs path for staging> \ - --streaming=true \ - --sourceType=PUBSUB \ - --pubSubMode=SUBSCRIBE_ONLY \ - --pubsubSubscription=<an existing Pubsub subscription to above topic> \ - --resourceNameMode=VERBATIM \ - --manageResources=false \ - --monitorJobs=false \ - --numWorkers=64 \ - --maxNumWorkers=64 \ - --suite=SMOKE \ - --usePubsubPublishTime=true \ - --outputPath=gs://<a gs path under which log files will be written> \ - --windowSizeSec=600 \ - --occasionalDelaySec=3600 \ - --maxLogEvents=10000 \ - --experiments=enable_custom_pubsub_source -``` +Set these up first so the below command is valid + + PROJECT=<your project> + ZONE=<your zone> + STAGING_LOCATION=gs://<a GCS path for staging> + PUBSUB_TOPCI=<existing pubsub topic> + +Launch: + + ./gradlew :beam-sdks-java-nexmark:run \ + -Pnexmark.runner=":beam-runners-google-cloud-dataflow" \ + -Pnexmark.args=" + --runner=DataflowRunner + --suite=SMOKE + --streamTimeout=60 + --streaming=true + --manageResources=false + --monitorJobs=true + --project=${PROJECT} + --zone=${ZONE} + --workerMachineType=n1-highmem-8 + --stagingLocation=${STAGING_LOCATION} + --streaming=true + --sourceType=PUBSUB + --pubSubMode=PUBLISH_ONLY + --pubsubTopic=${PUBSUB_TOPIC} + --resourceNameMode=VERBATIM + --manageResources=false + --monitorJobs=false + --numEventGenerators=64 + --numWorkers=16 + --maxNumWorkers=16 + --suite=SMOKE + --firstEventRate=100000 + --nextEventRate=100000 + --ratePeriodSec=3600 + --isRateLimited=true + --avgPersonByteSize=500 + --avgAuctionByteSize=500 + --avgBidByteSize=500 + --probDelayedEvent=0.000001 + --occasionalDelaySec=3600 + --numEvents=0 + --useWallclockEventTime=true + --usePubsubPublishTime=true + --experiments=enable_custom_pubsub_sink" ### Running query 0 on a Spark cluster with Apache Hadoop YARN - Building package: - mvn clean package -Pspark-runner + ./gradlew :beam-sdks-java-nexmark:assemble Submit to the cluster: - spark-submit --master yarn-client --class org.apache.beam.sdk.nexmark.Main --driver-memory 512m --executor-memory 512m --executor-cores 1 beam-sdks-java-nexmark-bundled-{{ site.release_latest }}.jar --runner=SparkRunner --query=0 --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true + spark-submit \ + --class org.apache.beam.sdk.nexmark.Main \ + --master yarn-client \ + --driver-memory 512m \ + --executor-memory 512m \ + --executor-cores 1 \ + sdks/java/nexmark/build/libs/beam-sdks-java-nexmark-{{ site.release_latest }}-spark.jar \ + --runner=SparkRunner \ + --query=0 \ + --streamTimeout=60 \ + --streaming=false \ + --manageResources=false \ + --monitorJobs=true" -- To stop receiving notification emails like this one, please contact [email protected].
