Re: [akka-user] Re: Rebalancing shard entries of different types

Luis Medina Mon, 02 Jun 2014 00:56:19 -0700

Given that fact, I went ahead and implemented a ShardAllocationStrategy that 
takes into account different entry types for figuring out what nodes to 
create new entries in. First, I just want to describe the use-case that led 
to me implementing this.


I'm working on a pipeline that will ingest real-time social media data and 
do processing on it. This pipeline will be getting this data from different 
streams (ie. one for Twitter, one for Facebook, etc). As such, I wanted to 
take advantage of the high availability and scalability features that 
Akka's clustering feature provides. I considered using singletons for this 
except that this would create a single point of failure in that if my 
eldest node goes down I lose all of my data streams until they're able to 
start up on a different node. Using singletons would have also created a 
resource bottleneck on the eldest node. As such, I decided to make use of 
sharding in order to distribute all of my different streams throughout my 
cluster. While sharding provides the distribution that singletons lack, it 
only works when sharding the same entry type (as you Patrik pointed out and 
I came to find out). This presented a problem since in my case an entry 
corresponded to a stream and no 2 streams were really the same. I did try 
doing something like this:

ClusterSharding.get(system).start("Feed",  StreamActor.props(), new 
MessageExtractor());

where instead of initializing the stream through its constructor, I would 
initialize it by passing in the necessary arguments through a series of 
message using ".tell()". This appeared to be promising at first until I 
reached a point where regardless of how generic I made my interfaces there 
was always something that a 3rd party library required that couldn't be 
abstracted from the Objects that Akka receives which left me with a bunch 
of "unchecked cast" warnings (which for me are a "no no" for me). This led 
to my only other option which was to implement my own 
ShardingAllocationStrategy that could work across different entry types. 
This looks like this:

public class FeedShardAllocationStrategy implements 
ShardCoordinator.ShardAllocationStrategy {
    private HashMap<String, Long> shardCount;

    public FeedShardAllocationStrategy(HashMap<String, Long> shardCount) {
        Validate.notNull(shardCount, "shardCount must not be null.");

        this.shardCount = shardCount;
    }

    @Override
    public ActorRef allocateShard(ActorRef actorRef, String s, 
Map<ActorRef, IndexedSeq<String>> actorRefIndexedSeqMap) {
        synchronized (this) {
            Iterator<ActorRef> shardRegions = 
actorRefIndexedSeqMap.keysIterator();

            ActorRef leastPopulatedShardRegion = new 
ShardCoordinator.LeastShardAllocationStrategy(1, 3).allocateShard(actorRef, 
s, actorRefIndexedSeqMap);
            long leastPopulatedShardCount = Long.MAX_VALUE;

            while (shardRegions.hasNext()) {
                ActorRef shardRegion = shardRegions.next();
                String address = shardRegion.path().address().toString();

                if (shardCount.get(address) == null) {
                    leastPopulatedShardCount = 0;
                    leastPopulatedShardRegion = shardRegion;
                } else {
                    long shardCount = this.shardCount.get(address);

                    if (shardCount < leastPopulatedShardCount) {
                        leastPopulatedShardCount = shardCount;
                        leastPopulatedShardRegion = shardRegion;
                    }
                }
            }

            
shardCount.put(leastPopulatedShardRegion.path().address().toString(), 
leastPopulatedShardCount + 1);
            return leastPopulatedShardRegion;
        }
    }

    @Override
    public Set<String> rebalance(scala.collection.immutable.Map<ActorRef, 
scala.collection.immutable.IndexedSeq<String>> actorRefIndexedSeqMap, 
Set<String> stringSet) {
        return new HashSet<>();
    }
}

As you can see in the allocateShard method, I use a HashMap to keep track 
of the number of shards that have been created using their address. By 
default I assign my leastPopulatedShardRegion to be whatever ShardRegion 
the LeastShardAllocationStrategy provides. For the "rebalance" method, 
since I don't want to have my streams stop in a re-balance, I return an 
empty Set in order to prevent this. This actually works quite well with my 
requirements as long as each stream shard implements the same instance of 
the ShardAllocationStrategy. For example doing:

ShardCoordinator.ShardAllocationStrategy shardAllocationStrategy = new 
FeedShardAllocationStrategy(new HashMap<>());

ClusterSharding.get(system).start("TwitterStream", StreamActor.props(new 
TwitterStreamManager()), new MessageExtractor(), shardAllocationStrategy);
ClusterSharding.get(system).start("FacebookStream", StreamActor.props(new 
FacebookStreamManager()), new MessageExtractor(), shardAllocationStrategy);
ClusterSharding.get(system).start("GooglePlusStream", StreamActor.props(new 
GooglePlusStreamManager()), new MessageExtractor(), 
shardAllocationStrategy);

allows for my streams to be evenly distributed across my cluster while at 
the same time allowing me to implement different behavior for each one 
without compromising the "cleanliness" of the rest of my application (in 
addition to having the streams restart if the node they are on goes down).

Now I realize that sharding isn't really meant to work across different 
entry types and that sharding is meant to be used to scale out the 
functionality of a particular entry type by creating multiple instances of 
them throughout a cluster, unlike the way that I'm using it where I only 
have a single instance of the same entry type running in the entirety of my 
cluster. I guess I just had a very specific use-case that I wanted to 
implement in Akka without having to involve additional third party tools. 
What do you guys think? Thoughts, comments?

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Re: [akka-user] Re: Rebalancing shard entries of different types

Reply via email to