GitHub user raboof added a comment to the discussion: Pekko Cluster Sharding - 
Race condition

> Cluster Sharding what are probabilities of a file being processed once at a 
> given time across all the pods in the cluster?
> 
> Each entity in a sharded cluster is uniquely identified by its entityId as 
> far the documentation mentions.

Yes: in a healthy cluster, you can be confident there will be a maximum of one 
actor running for each entity id. As such cluster sharding is indeed sufficient 
to make sure "a file being processed once at a given time across all the pods 
in the cluster".

> will it ensure at a given time the file is processed by exactly one actor in 
> the cluster, or we need to have a locking mechanism in place / or implement 
> the file-processing logic in such a way that it ensure to be idempotent

As mentioned in https://github.com/apache/pekko-connectors/discussions/814, on 
its own cluster sharding is not sufficient to get exactly-once delivery: when 
this file processing is interrupted for some reason, you need some way to make 
sure you can decide whether you need to re-start/resume this processing. You 
don't need any additional locking for this, but indeed making the upload 
idempotent would help to solve this aspect.

> If there are too many files in the source HDFS directory then there will be 
> good number of actors at a given time will be created - will it add to a 
> performance bottleneck

Actors are typically cheap, so in that sense the number of actors will not be a 
performance bottleneck. However, if there are many files, I could imagine you'd 
overload your system by starting too many uploads in parallel. You could 
probably restrict the number of parallel uploads from your HDFS scanning code.

GitHub link: 
https://github.com/apache/pekko/discussions/1508#discussioncomment-10799003

----
This is an automatically sent email for notifications@pekko.apache.org.
To unsubscribe, please send an email to: 
notifications-unsubscr...@pekko.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@pekko.apache.org
For additional commands, e-mail: notifications-h...@pekko.apache.org

Reply via email to