We have identified the leak. https://github.com/apache/beam/issues/28246
has the details and workarounds.

On Mon, Aug 28, 2023 at 9:57 AM Valentyn Tymofieiev <valen...@google.com>
wrote:

> This appears to be a recent issue reported also by others (e.g.
> https://github.com/apache/beam/issues/28142), it's being actively
> investigated. Therefore, it is unlikely that memory fragmentation is an
> issue.
>
> On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev <valen...@google.com>
> wrote:
>
>> Hi, thanks for reaching out.
>>
>> I'd be curious to see whether the memory consumption patterns you observe
>> change if you switch the memory allocator library.
>>
>> For example, you could try to use a custom container, install jemalloc
>> and enable it. See:
>> https://beam.apache.org/documentation/runtime/environments ,
>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>>
>> Your Dockerfile might look like the following:
>>
>> FROM apache/beam_python3.10_sdk:2.49.0
>>
>> # Prebuilt other dependencies
>> RUN apt-get update \
>>   && apt-get install -y libjemalloc-dev
>>
>> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
>>
>> # Set the entrypoint to the Apache Beam SDK launcher.
>> ENTRYPOINT ["/opt/apache/beam/boot"]
>>
>>
>> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee <le...@allium.so> wrote:
>>
>>> Hello!
>>>
>>> I'm an avid apache beam user (on Dataflow) and we use beam to stream
>>> blockchain data to various sinks. I recently noticed some memory issues
>>> across all our pipelines but have yet to be able to find the root cause and
>>> was hoping someone on your team might be able to help. If this isn't the
>>> right avenue for it, please let me know how I should reach out.
>>>
>>> The details are here in stackoverflow:
>>>
>>>
>>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>>>
>>> Thanks,
>>> Chenghan
>>> CTO | Allium
>>>
>>

Reply via email to