Hi Stephan,

I am not sure if this is the best way to achieve this, but I've seen
parallelism being limited by using state / KV and limiting the number of
keys.
In your case, you could have the same key for both non concurrency-safe
operations and when using state, the Beam model will guarantee that they
aren't concurrently executed.

This blog post may be helpful:
https://beam.apache.org/blog/stateful-processing/




On Mon, Jun 12, 2023 at 2:21 PM Stephan Hoyer via dev <dev@beam.apache.org>
wrote:

> Can the Beam data model (specifically the Python SDK) support executing
> functions that are idempotent but not concurrency-safe?
>
> I am thinking of a task like setting up a database (or in my case, a Zarr
> <https://zarr.dev/> store in Xarray-Beam
> <https://github.com/google/xarray-beam>) where it is not safe to run
> setup concurrently, but if the whole operation fails it is safe to retry.
>
> I recognize that a better model would be to use entirely atomic
> operations, but sometimes this can be challenging to guarantee for tools
> that were not designed with parallel computing in mind.
>
> Cheers,
> Stephan
>

Reply via email to