[GitHub] [beam] kamilwu commented on a change in pull request #12542: [BEAM-10674] Add Python CoGBK load test for streaming on Dataflow

GitBox Thu, 20 Aug 2020 07:15:19 -0700


kamilwu commented on a change in pull request #12542:
URL: https://github.com/apache/beam/pull/12542#discussion_r474015631




##########
File path: .test-infra/jenkins/job_LoadTests_coGBK_Python.groovy
##########
@@ -147,25 +147,30 @@ def loadTestConfigurations = { datasetName ->
         autoscaling_algorithm: 'NONE'
       ]
     ],
-  ].each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) }
+  ]
+  .each { test -> test.pipelineOptions.putAll(additionalPipelineArgs) }
+  .each { test -> (mode) != 'streaming' ?: addStreamingOptions(test) }
 }
 
-def batchLoadTestJob = { scope, triggeringContext ->
-  scope.description('Runs Python CoGBK load tests on Dataflow runner in batch 
mode')
-  commonJobProperties.setTopLevelMainJobProperties(scope, 'master', 240)
+def addStreamingOptions(test) {
+  // Use highmem workers to prevent out of memory issues.
+  test.pipelineOptions << [streaming: null,
+    worker_machine_type: 'n1-highmem-4'

Review comment:
       > How is it called in Java SDK?
   
   It's `--numberOfWorkerHarnessThreads`. Some time ago, its equivalent in 
Python SDK was `--experimental worker_threads=[n]`, but the option was removed. 
I'll open a discussion on dev@ to find out if it is something the community 
should take care of.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] kamilwu commented on a change in pull request #12542: [BEAM-10674] Add Python CoGBK load test for streaming on Dataflow

Reply via email to