Re: Dealing with expensive jenkins + Dataflow jobs

Łukasz Gajowy Wed, 23 Jan 2019 06:33:54 -0800

Hi,

pinging this thread (maybe some folks missed it). What do you think about
those concerns/ideas?


Łukasz

pon., 14 sty 2019 o 17:11 Łukasz Gajowy <[email protected]> napisał(a):

> Hi all,
>
> one problem we need to solve while working with load tests we currently
> develop is that we don't really know how much GCP/Jenkins resources can we
> occupy. We did some initial testing with
> beam_Java_LoadTests_GroupByKey_Dataflow_Small[1] and it seems that for:
>
> - 1 000 000 000 (~ 23 GB) synthetic record
> - 10 fanouts
> - 10 dataflow workers (--maxNumWorkers)
>
> the total job time exceeds 4 hours. It seems too much for such a small
> load test. Additionally, we plan to add much bigger tests for other core
> operations too. The proposal [2] describes only few of them.
>
> The questions are:
> 1. how many workers can we assign to this job without starving the other
> jobs? Are 32 workers for a single Dataflow job fine? Would 64 workers for
> such job be fine either?
> 2. given the plans that we are going to add more and more load tests soon,
> do you think it is a good idea to create a separate GCP project + separate
> Jenkins workers for load testing purposes only? This would avoid starvation
> of critical tests (post commits, pre-commits, etc). Or maybe there is
> another solution that will bring such isolation? Is such isolation needed?
>
> Ad 2: Please note that we will also need to host Flink/Spark clusters
> later on GKE/Dataproc (not decided yet).
>
> [1]
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Java_LoadTests_GroupByKey_Dataflow_Small_PR/
> [2] https://s.apache.org/load-test-basic-operations
>
>
> Thanks,
> Łukasz
>
>

Re: Dealing with expensive jenkins + Dataflow jobs

Reply via email to