[akka-user] Re: Multi-Machine Distributed Work

Matt Edwards Tue, 14 Jan 2014 15:18:36 -0800

Hey folks,

thanks for the replies. And sorry for the delayed response. We are in the 
middle of a deployment, and things went a bit sideways, had to straighten 
it back out.


I have read over the docs listed above, and they sound great, but I am a 
bit lost on the implementation. I think the solution i am leaning towards 
is a cluster aware router. 

I am also reading up on Cluster Singleton, this seems to be an accurate 
representation of what i need (single master, many workers). I know it 
creates the single point of failure, but i only have 1 input source and 
need to spawn many jobs. 

Part of what confuses me, it the nature of our program compared with AKKA, 
it would seem that I need to program in some OSGI activators to start up 
the cluster when the program starts. Deploying the program to various 
nodes, and starting them, i will have them register back to the 
master/seed. And then reference that cluster with the actual analysis 
happens?  Does that seem like plausible attack?

Thanks again for the feedback. 
matt


On Monday, January 13, 2014 1:04:08 PM UTC-7, Eric Pederson wrote:
>
> Hi Ryan - 
>
> Why do you deploy the actors rather than starting up the actors on the 
> worker nodes and having them register with the master from there?   Is it 
> to keep a proper supervision hierarchy?
>
> Thanks,
>
> On Saturday, January 11, 2014 5:00:49 PM UTC-5, Ryan Tanner wrote:
>>
>> Eric,
>>
>> We give each node a role in its Akka config.  The cluster-aware router 
>> for that role (router is on the supervisor node) then remotely deploys 
>> routees on those nodes so that we don't have to do so manually.  Each 
>> routee only does one piece of work at a time which it delegates to a 
>> "processor" actor—this lets us use this pattern generically for all our 
>> worker roles.  There is more than one routee per box but this makes it much 
>> easier to control the amount of work done on each node.  Without such a 
>> restriction, we ran into issues with OOM errors due to doing too much work 
>> at once.  Not an issue locally but on a relatively anemic EC2 instance it's 
>> a problem.
>>
>> As in the "Balancing Workload Across Nodes" post, those routees register 
>> themselves with the "role leader" which doles out and tracks work in the 
>> same way that post describes.  We never send messages to the router itself, 
>> we just use it to automatically deploy routees for us.
>>
>> On Friday, January 10, 2014 7:51:34 PM UTC-7, Eric Pederson wrote:
>>>
>>> Hi Ryan - I'm curious to learn more about how you are using the 
>>> cluster-aware router and what it provides over the basic clustering 
>>> membership functionality.
>>>
>>> Matt - we are also using a variation of "Balancing Workload Across 
>>> Nodes" (pre Akka Cluster) and it works great.
>>>
>>> Thanks,
>>>
>>> On Friday, January 10, 2014 9:15:20 AM UTC-5, Ryan Tanner wrote:
>>>>
>>>> We use that pattern in conjunction with clustering.  We use a 
>>>> cluster-aware router to deal with node membership within a worker role but 
>>>> we never actually send messages to the router's ActorRef, we let the 
>>>> routees register themselves with the "role leader" as we call it when they 
>>>> start.
>>>>
>>>> On Friday, January 10, 2014 1:29:37 AM UTC-6, Hendrik Schöneberg wrote:
>>>>>
>>>>> As an alternative to the cluster approach you could check out 
>>>>>
>>>>>
>>>>> http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
>>>>>   
>>>>>
>>>>> This pattern allows you to dynamically add (local or remote) workers 
>>>>> to a master actor.
>>>>>
>>>>> Hendrik
>>>>>
>>>>> Am Donnerstag, 9. Januar 2014 23:52:31 UTC+1 schrieb Matt Edwards:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am doing some initial research into distributed computing for a 
>>>>>> project I work on, and I am wondering if AKKA may be that solution. 
>>>>>> Below I 
>>>>>> list the project essentials, and a few design Ideas. I would appreciate 
>>>>>> any 
>>>>>> feedback to help narrow down the amount of literature I need to read up 
>>>>>> on. 
>>>>>>
>>>>>>
>>>>>> The project is an analysis tool. and only runs when the user clicks 
>>>>>> the go button. Currently this process takes around 45min (which by 
>>>>>> comparative software, is lighting fast). However one phase of this 
>>>>>> analysis 
>>>>>> is taking around 30minutes to complete. This phase is a data 
>>>>>> extrapolator 
>>>>>> (takes a set of input, and predict future data over a time range). This 
>>>>>> phase is also 100% independent of any other peice of data (part A doesnt 
>>>>>> care about part B). For this reason AKKA seems great. On the not so 
>>>>>> great 
>>>>>> side. I need to block the mainline process until the extrapolation is 
>>>>>> done, 
>>>>>> else the analysis doesnt have the data needed.
>>>>>>
>>>>>> Also to Note: This is all Java based. I have no experience in Scala.
>>>>>>
>>>>>> The current implementation in java builds a List<Runnable> for all 
>>>>>> the data points. Creates a CountDownLatch(list.size()) and then Submits 
>>>>>> the 
>>>>>> list to the Execution Service. This spins up one Thread in the CPU per 
>>>>>> Runnable. When the Future is available the code pulls it, stores it in a 
>>>>>> map, and decrements the latch. once the latch is done, the code proceeds.
>>>>>>
>>>>>> I currently peg a 24core (4x AMD Opteron(tm) Processor 8435 2.6GHz) 
>>>>>> server for the full 30 minutes. I have also tested on an 80core server 
>>>>>> machine, to the same result, max thread usage, though quicker results. 
>>>>>> This 
>>>>>> process is extremely predictable in performance as well. Double the 
>>>>>> cores, 
>>>>>> half the time. On my normal 24core machine, each thread takes between 
>>>>>> 25-30 
>>>>>> seconds to complete. So transmission times should be small compared to 
>>>>>> process times. With this in mind. it is far cheaper to find 24 core 
>>>>>> machines than 80 cores. It is also advantageous to use some decommission 
>>>>>> equipment, that has perfectly useable CPU power.  (also note: memory is 
>>>>>> not 
>>>>>> a limiting factor in this). These machine will all be in the same server 
>>>>>> room, I dont plan on broadcasting cross country for this.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The way I have interpreted AKKA is that  each "thread" is run though 
>>>>>> use of a actor to complete, rather than the execution service. But I am 
>>>>>> wondering how does data get from a "master" and routed out to anther 
>>>>>> machine? how capable is it for multi-machine distribution? And what is 
>>>>>> the 
>>>>>> better approach to it. Clustering/Routing/Remote? The Master actor can 
>>>>>> have 
>>>>>> a router attached, can that router broadcast to remote actors? Should 
>>>>>> those 
>>>>>> remote receivers be the final actor, or another router who them calls 
>>>>>> the 
>>>>>> workers on the local machine? 
>>>>>>
>>>>>> I am still coming up to speed on how distributed computing and 
>>>>>> clustering function, so forgive me if anything seems trivial or 
>>>>>> irrelevant.
>>>>>>
>>>>>> Essentially, what I would like to see, once everything is done, is 
>>>>>> having 3-5 servers (plus the main) crunching the data extrapolation 
>>>>>> simultaneously. thus cutting my time drastically. 
>>>>>>
>>>>>> I will be using  the 2.3-M2 code base for now. My program is OSGI 
>>>>>> based, and this Version of akka was by far more friendly, though still 
>>>>>> took 
>>>>>> some work in integrate.
>>>>>>
>>>>>> If I've read the documentation correctly, I can deploy the same 
>>>>>> application to every machine, and just tweak the config files. Will I 
>>>>>> need 
>>>>>> an activator on the OSGI component for the remote machines? The "master" 
>>>>>> program will only be on one machine, i just want to steal the resources 
>>>>>> of 
>>>>>> another machine temporarily. It should also be able to have machines 
>>>>>> come 
>>>>>> and go from service. Though I think the config files handle that as 
>>>>>> well...correct?  Or is all of that prior info mis-interpreted, and i 
>>>>>> need a 
>>>>>> special app on the remote machines?
>>>>>>
>>>>>> I know its kind of long winded, I just to be thorough. . Thanks for 
>>>>>> reading, and thanks for the help.
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://akka.io/faq/
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/groups/opt_out.

[akka-user] Re: Multi-Machine Distributed Work

Reply via email to