We have identified the leak. https://github.com/apache/beam/issues/28246 has the details and workarounds.
On Mon, Aug 28, 2023 at 9:57 AM Valentyn Tymofieiev <valen...@google.com> wrote: > This appears to be a recent issue reported also by others (e.g. > https://github.com/apache/beam/issues/28142), it's being actively > investigated. Therefore, it is unlikely that memory fragmentation is an > issue. > > On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev <valen...@google.com> > wrote: > >> Hi, thanks for reaching out. >> >> I'd be curious to see whether the memory consumption patterns you observe >> change if you switch the memory allocator library. >> >> For example, you could try to use a custom container, install jemalloc >> and enable it. See: >> https://beam.apache.org/documentation/runtime/environments , >> https://cloud.google.com/dataflow/docs/guides/using-custom-containers >> >> Your Dockerfile might look like the following: >> >> FROM apache/beam_python3.10_sdk:2.49.0 >> >> # Prebuilt other dependencies >> RUN apt-get update \ >> && apt-get install -y libjemalloc-dev >> >> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so >> >> # Set the entrypoint to the Apache Beam SDK launcher. >> ENTRYPOINT ["/opt/apache/beam/boot"] >> >> >> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee <le...@allium.so> wrote: >> >>> Hello! >>> >>> I'm an avid apache beam user (on Dataflow) and we use beam to stream >>> blockchain data to various sinks. I recently noticed some memory issues >>> across all our pipelines but have yet to be able to find the root cause and >>> was hoping someone on your team might be able to help. If this isn't the >>> right avenue for it, please let me know how I should reach out. >>> >>> The details are here in stackoverflow: >>> >>> >>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io >>> >>> Thanks, >>> Chenghan >>> CTO | Allium >>> >>