GitHub user raboof added a comment to the discussion: Pekko Cluster Sharding - Race condition
> Cluster Sharding what are probabilities of a file being processed once at a > given time across all the pods in the cluster? > > Each entity in a sharded cluster is uniquely identified by its entityId as > far the documentation mentions. Yes: in a healthy cluster, you can be confident there will be a maximum of one actor running for each entity id. As such cluster sharding is indeed sufficient to make sure "a file being processed once at a given time across all the pods in the cluster". > will it ensure at a given time the file is processed by exactly one actor in > the cluster, or we need to have a locking mechanism in place / or implement > the file-processing logic in such a way that it ensure to be idempotent As mentioned in https://github.com/apache/pekko-connectors/discussions/814, on its own cluster sharding is not sufficient to get exactly-once delivery: when this file processing is interrupted for some reason, you need some way to make sure you can decide whether you need to re-start/resume this processing. You don't need any additional locking for this, but indeed making the upload idempotent would help to solve this aspect. > If there are too many files in the source HDFS directory then there will be > good number of actors at a given time will be created - will it add to a > performance bottleneck Actors are typically cheap, so in that sense the number of actors will not be a performance bottleneck. However, if there are many files, I could imagine you'd overload your system by starting too many uploads in parallel. You could probably restrict the number of parallel uploads from your HDFS scanning code. GitHub link: https://github.com/apache/pekko/discussions/1508#discussioncomment-10799003 ---- This is an automatically sent email for notifications@pekko.apache.org. To unsubscribe, please send an email to: notifications-unsubscr...@pekko.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@pekko.apache.org For additional commands, e-mail: notifications-h...@pekko.apache.org