Hi, pinging this thread (maybe some folks missed it). What do you think about those concerns/ideas?
Łukasz pon., 14 sty 2019 o 17:11 Łukasz Gajowy <[email protected]> napisał(a): > Hi all, > > one problem we need to solve while working with load tests we currently > develop is that we don't really know how much GCP/Jenkins resources can we > occupy. We did some initial testing with > beam_Java_LoadTests_GroupByKey_Dataflow_Small[1] and it seems that for: > > - 1 000 000 000 (~ 23 GB) synthetic record > - 10 fanouts > - 10 dataflow workers (--maxNumWorkers) > > the total job time exceeds 4 hours. It seems too much for such a small > load test. Additionally, we plan to add much bigger tests for other core > operations too. The proposal [2] describes only few of them. > > The questions are: > 1. how many workers can we assign to this job without starving the other > jobs? Are 32 workers for a single Dataflow job fine? Would 64 workers for > such job be fine either? > 2. given the plans that we are going to add more and more load tests soon, > do you think it is a good idea to create a separate GCP project + separate > Jenkins workers for load testing purposes only? This would avoid starvation > of critical tests (post commits, pre-commits, etc). Or maybe there is > another solution that will bring such isolation? Is such isolation needed? > > Ad 2: Please note that we will also need to host Flink/Spark clusters > later on GKE/Dataproc (not decided yet). > > [1] > https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/ > [2] https://s.apache.org/load-test-basic-operations > > > Thanks, > Łukasz > >
