Are you using Dataflow runner v2[1]?

If so, then you can use:
--number_of_worker_harness_threads=X

Do you know where/why the OOM is occurring?

1:
https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2
2:
https://github.com/apache/beam/blob/017936f637b119f0b0c0279a226c9f92a2cf4f15/sdks/python/apache_beam/options/pipeline_options.py#L834

On Thu, Aug 20, 2020 at 7:33 AM Kamil Wasilewski <
[email protected]> wrote:

> Hi all,
>
> As I stated in the title, is there an equivalent for
> --numberOfWorkerHarnessThreads in Python SDK? I've got a streaming pipeline
> in Python which suffers from OutOfMemory exceptions (I'm using Dataflow).
> Switching to highmem workers solved the issue, but I wonder if I can set a
> limit of threads that will be used in a single worker to decrease memory
> usage.
>
> Regards,
> Kamil
>
>

Reply via email to