RE: RE: [Alchemi-developers] Fault Tolerance in Alchemi

Tibor Biro Mon, 20 Mar 2006 06:08:10 -0800

Hi,

You present an interesting point of view. If I understand correctly you would see it working similar to an active/passive clustering solution. I think we can easily implement it as a farm of Managers. The data synchronization could happen through the database. There are already proven methods of implementing failover at the database level so we do not need to worry about that.

Here are a few ideas:

- Executors are configured with a list of Managers. The executor would connect to a Manager at random or by whatever algorithm.

- The Managers would use the same database to put work into and to get the threads to execute.

- Any Manager should be able to receive the result from the Executors but this is an optional feature

If the database is “in memory” then we should not worry about multiple Managers.

Having multiple Managers in the infrastructure could also present interesting network configuration options to the grid administrators.

The option to have hierarchical grids was scaled back in 1.0 but it is still a desirable feature. At the time it was decided that it is better to focus on the core features. As far as I see them the hierarchical grids are not a failover solution but a way to transfer work from one grid to another. Compared to a simple grid the hierarchical grids introduce more points of failure. Anyway, the code is still there and if somebody would take the time to see how it can be re-enabled and tested it would be a great addition to the framework.

Regards,

Tibor

From: andrew hudson [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 4:11 AM
To: Tibor Biro
Cc: [email protected]
Subject: Re: RE: [Alchemi-developers] Fault Tolerance in Alchemi

hello sir
A lot of time spent by me in looking in both the matter related to fault tolerance in alchemi.The first problem seems to be intresting but is little complex and also i m not getting any idea from where to start in it.
The second problem of "manager node failure" seems to be a new addition if we are able to solve it.The coding of alchemi seems to be complex for a novice user like me.
I had studied various class files like Gmanager, IManager, GExecutors, GThread.....but still i m not getting the point that in which file i be finding the correct material to use for the second proble.
I am having a very good concept in mind for providing Alchemi with backup server but i am struc at implementation part. My idea is that:-

1. As executors send heartbeats to manager for telling their status, similarly we can engage the backup node which will be informed by the manager node for its existance.

2. The backup manager node will hv all that information which the actual manager have.

3. Incase of failure of manager, the backup manager will come into play. What we have to do in this case is to inform all the executor nodes that this is their new manager.

4. With this we will be able to run our grid even in case of manager node failure.

And for the first problem, i am looking for more ideas. sir please help me to make my ideas to be implemented by me.For this from where i have to start with.I got confused after seeing such lengthy code at one time.

One more help other than this is needed by me is that how to implement the multicluster grid using Alchemi.In the source code of alchemi1.0.3 i found that the multicluster support is removed from alchemi after 1.0 version. Is it true. But i want to implement multicluster approach.

Eagerly waiting for your reply.

RE: RE: [Alchemi-developers] Fault Tolerance in Alchemi

Reply via email to