Hi all, one problem we need to solve while working with load tests we currently develop is that we don't really know how much GCP/Jenkins resources can we occupy. We did some initial testing with beam_Java_LoadTests_GroupByKey_Dataflow_Small[1] and it seems that for:
- 1 000 000 000 (~ 23 GB) synthetic record - 10 fanouts - 10 dataflow workers (--maxNumWorkers) the total job time exceeds 4 hours. It seems too much for such a small load test. Additionally, we plan to add much bigger tests for other core operations too. The proposal [2] describes only few of them. The questions are: 1. how many workers can we assign to this job without starving the other jobs? Are 32 workers for a single Dataflow job fine? Would 64 workers for such job be fine either? 2. given the plans that we are going to add more and more load tests soon, do you think it is a good idea to create a separate GCP project + separate Jenkins workers for load testing purposes only? This would avoid starvation of critical tests (post commits, pre-commits, etc). Or maybe there is another solution that will bring such isolation? Is such isolation needed? Ad 2: Please note that we will also need to host Flink/Spark clusters later on GKE/Dataproc (not decided yet). [1] https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/ [2] https://s.apache.org/load-test-basic-operations Thanks, Łukasz