GitHub user Susmit07 edited a discussion: Pekko Cluster Sharding - Race 
condition

In a Cluster Sharding what are probabilities of a file being processed once at 
a given time across all the pods in the cluster?


Each entity in a sharded cluster is uniquely identified by its entityId as far 
the documentation mentions. 

I have 2 doubts:

- For each file within a HDFS directory, if we provide a unique entity ID by 
hashing the file path, will it ensure at a given time the file is processed by 
exactly one actor (node) in the cluster, or we need to have a locking mechanism 
in place / or implement the file-processing logic in such a way that it ensure 
to be idempotent
- If there are too many files in the source HDFS directory then there will be 
good number of actors at a given time will be created - will it add to a 
performance bottleneck

Considering the above requirements is Cluster Sharding an appropiate technique 
to adopt for distributed file download (Cluster mode deployment will ensure 
exactly one time pull for a file to be processed which is good but problem is 
it won't scale when directories and files increase, and idle pods in the 
cluster  is another drawback)

Hoping for contributors to provide some insight, grateful !

GitHub link: https://github.com/apache/pekko-connectors/discussions/835

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to