Hi John,

Yes, this is getting quite long - but hopefully useful.

So, regarding (1) below:
We are currently using a custom logger class which basically closely echoes the log4Net logger, as in there are logger.debug, warn, info methods... All it does is fire up an event so that the eventhandler can deal with the actual logging. So, in the executor for example, when running in service-mode or normal mode, the actual service-code / application GUI code handles this log event and does something appropriate. In both cases, we just use the log4Net library to write the message to a log file. Now, we could have simply used the log4Net library inside the Alchemi core dll as well. However I though it is better to have this extra abstraction, so that if we decide to use some other logging library, it would be easy to change...just need to change it at one place..in the GUI app / service-app, instead of all over the place in the core library. Also, this means the Alchemi core library is actually independent of any external logging library. So John, any help would be useful :)

At the risk of repeating myself:
The Gridbus broker, is an interesting project. In fact, part of it is like the Alchemi manager, but a bit more advanced. It has 3 or 4 different scheduling algorithms, which are the results of our research at GRIDS lab, at Melbourne University. It includes an economy-based scheduler, which I mentioned a while back...and also a data-aware scheduler which tries to optimise data transfer by choosing a compute-server (or executor) close to a datahost (which hosts the data needed by a grid app), based on their network proximity among other things. (Of course, this data scheduler is aimed at grid apps that need data in the order of gigabytes....). For more info (just in case someone has missed it) you may want to check out: http://www.gridbus.org/broker We are thinking of borrowing some ideas from the broker to incorporate into new schedulers in Alchemi's manager. (And yes, both the broker and alchemi are actually part of my day job ;). (And John, congrats and all the best with MS...:)

Regarding (2):
Yes, I think if a Manager goes down, when a GThread is executing on an executor, then the Executor does not know what to do with the GThread. In fact, I am not really sure what would happen. I guess it would just throw up an exception, and perhaps even bring down the entire Executor application. (hmm...need to check the code to try and guess what could happen...havent really tested that one....) Also, just wanted to clarify that, an executor can only be connected to one manager at any point of time. And I am not really sure the term "executor" in the idea you mentioned below, has the same meaning as the executor in Alchemi at present. But yes, we could have that seperate layer, to handle appdomain life times.

Cheers
Krishna.

Wow, this thread is starting to get long but I think a lot of good details are coming to light and are being recorded for posterity in the mailing list. Now if we could just compile it into on resource. :D

If I put my comments inline they will be hard to read so I will try to query/respond by providing section number.

1.) Krishna I think that I may be able to help with this. I have ran into similiar issues with logging in secondary appdomains. I don't have the code setting in front of me so excuse my ignorance but what are you using for logging currently? No sweat about not having time for Alchemi, we all have day jobs and understand. I'm getting ready to start at Microsoft out in Redmond at the end of March so I will be fairly busy the next couple of months. I'll have to check out the work you are doing with Grid Broker, sounds interesting.

2.) So Krishna what is the behavior if we have a Manager that goes belly up or communications with the net is severed. Does all worker nodes of that manager leave the appdomain hanging until the Executor is shutdown? If an executor is connected to several managers and their GApplication that is being run on that worker node on the Managers behalf is larger which may very well happen with the type of applications that are suitable for grid enablement this could become a prominent issue. What I was proposing is as follows. The ServiceManager/ExecutorController is what a Manager communicates with on a worker node.. This Controller will fire up and manage appdomains based upon a number of Managers and then start an executor in that appdomain. These 'executors' objects are based upon MarshalByRef objects that have configurable lease lifetime on them. The Controller acts as a bridge between the 'Manager' and the 'Executor' routing all calls to the 'Executor'. Everytime communications happen between the two the lease lifetime is extended. If a Manager drops off line or communications is cut for whatever reason the lease lifetime will expire for that 'Executor' and the Controller will be notified by a delegate and then clean up the 'abandoned' appdomain. The can also be initiated by the Manager when it is done executing is work. As I see it it is just another layer of abstraction between the Manager and Executor that allows for a little more robustness.

I think I helped Tibor out with the threading issues he was facing. Tibor, did that work for you?

3.) Krishna, I was thinking of a little longer caching lifetime for dlls. So lets say one day a Manager1 needs AppA to be executed on the grid. You have to push the dlls of that AppA down to each worker node. You finish your work for that day, the manager notifies the worker nodes that it no longer needs their services and they clean up any executable payload that was pushed to them. Next day Manager2 needs AppA to be executed on the grid. Follow the same exact steps as the first day. Now shorten the time to 12 hours, 1 hour, 1 minute. A lot of redundant bytes could be flying around the grids topology. If we could cache the Apps being pushed around on the worker nodes and have a manager check to see if it is on a worker node before pushing it it would make it less network intensive. Basically all managers push an app to the controller on a worker node if it doesn't already reside there. Then when the 'Executor' is loading up the App for the 'Manager' it is pulled from this central repository, which is in effect mutliple folders, one per app, and copies them to a shadow directory which the 'Executors' appdomains path points to. This alows for multiple versions of the same app to be run side by side in different app domains.

4.) Security. Krishna I agree with everything you said. You would want exactly that level of control of security.

5.) Krishna, if I am reading between the lines correctly it would almost seem that you are talking about some sort of P2P overlay topology for the grid. If this correct I think that it is a fantastic idea. I have been involved in a couple of P2P apps and I would be glad to lend a hand with implementation for Alchemi. This would allow for clustering of resources. It would also allow for pushing through firewalls and all manner of network nastiness that can happen. But, like I alluded to above, I'll be busy until probably mid-May getting settled in with my new employer. After that I would be happy to contribute to the project.

Have a great day,

John



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
alchemi-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-users

Reply via email to