GitHub user Susmit07 closed a discussion: Pekko Cluster Sharding - Race 
condition

Hello Developers,

We are planning to use Pekko connectors to observe a file directory in a 
distributed storage like HDFS, and pull parquet files wheneve r available. We 
thought of multiple approaches of cluster deployment, each has its own pros and 
cons. one of them being Cluster Sharding

In a Cluster Sharding what are probabilities of a file being processed once at 
a given time across all the pods in the cluster?

Each entity in a sharded cluster is uniquely identified by its entityId as far 
the documentation mentions.

I have 2 doubts:

For each file within a HDFS directory, if we provide a unique entity ID by 
hashing the file path, will it ensure at a given time the file is processed by 
exactly one actor (node) in the cluster, or we need to have a locking mechanism 
in place / or implement the file-processing logic in such a way that it ensure 
to be idempotent
If there are too many files in the source HDFS directory then there will be 
good number of actors at a given time will be created - will it add to a 
performance bottleneck
Considering the above requirements is Cluster Sharding an appropriate technique 
to adopt for distributed file download (Singleton Cluster mode deployment will 
ensure exactly one time pull for a file to be processed which is good for our 
usecase but problem is it won't scale when directories and files increase, and 
idle pods in the cluster is another drawback, so we thought of exploring 
sharding cluster)

Hoping for contributors to provide some insight, grateful !

GitHub link: https://github.com/apache/pekko/discussions/1507

----
This is an automatically sent email for notifications@pekko.apache.org.
To unsubscribe, please send an email to: 
notifications-unsubscr...@pekko.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@pekko.apache.org
For additional commands, e-mail: notifications-h...@pekko.apache.org

Reply via email to