Hi John,
Yes, this is getting quite long - but hopefully useful.
So, regarding (1) below:
We are currently using a custom logger class which basically closely
echoes the log4Net logger, as in there are logger.debug, warn, info
methods... All it does is fire up an event so that the eventhandler can
deal with the actual logging. So, in the executor for example, when
running in service-mode or normal mode, the actual service-code /
application GUI code handles this log event and does something
appropriate. In both cases, we just use the log4Net library to write the
message to a log file. Now, we could have simply used the log4Net
library inside the Alchemi core dll as well. However I though it is
better to have this extra abstraction, so that if we decide to use some
other logging library, it would be easy to change...just need to change
it at one place..in the GUI app / service-app, instead of all over the
place in the core library. Also, this means the Alchemi core library is
actually independent of any external logging library. So John, any help
would be useful :)
At the risk of repeating myself:
The Gridbus broker, is an interesting project. In fact, part of it is
like the Alchemi manager, but a bit more advanced. It has 3 or 4
different scheduling algorithms, which are the results of our research
at GRIDS lab, at Melbourne University. It includes an economy-based
scheduler, which I mentioned a while back...and also a data-aware
scheduler which tries to optimise data transfer by choosing a
compute-server (or executor) close to a datahost (which hosts the data
needed by a grid app), based on their network proximity among other
things. (Of course, this data scheduler is aimed at grid apps that need
data in the order of gigabytes....). For more info (just in case someone
has missed it) you may want to check out: http://www.gridbus.org/broker
We are thinking of borrowing some ideas from the broker to incorporate
into new schedulers in Alchemi's manager.
(And yes, both the broker and alchemi are actually part of my day job
;). (And John, congrats and all the best with MS...:)
Regarding (2):
Yes, I think if a Manager goes down, when a GThread is executing on an
executor, then the Executor does not know what to do with the GThread.
In fact, I am not really sure what would happen. I guess it would just
throw up an exception, and perhaps even bring down the entire Executor
application. (hmm...need to check the code to try and guess what could
happen...havent really tested that one....)
Also, just wanted to clarify that, an executor can only be connected to
one manager at any point of time.
And I am not really sure the term "executor" in the idea you mentioned
below, has the same meaning as the executor in Alchemi at present. But
yes, we could have that seperate layer, to handle appdomain life times.
Cheers
Krishna.
Wow, this thread is starting to get long but I think a lot of good
details are coming to light and are being recorded for posterity in
the mailing list. Now if we could just compile it into on resource. :D
If I put my comments inline they will be hard to read so I will try to
query/respond by providing section number.
1.) Krishna I think that I may be able to help with this. I have ran
into similiar issues with logging in secondary appdomains. I don't
have the code setting in front of me so excuse my ignorance but what
are you using for logging currently? No sweat about not having time
for Alchemi, we all have day jobs and understand. I'm getting ready
to start at Microsoft out in Redmond at the end of March so I will be
fairly busy the next couple of months. I'll have to check out the
work you are doing with Grid Broker, sounds interesting.
2.) So Krishna what is the behavior if we have a Manager that goes
belly up or communications with the net is severed. Does all worker
nodes of that manager leave the appdomain hanging until the Executor
is shutdown? If an executor is connected to several managers and
their GApplication that is being run on that worker node on the
Managers behalf is larger which may very well happen with the type of
applications that are suitable for grid enablement this could become a
prominent issue. What I was proposing is as follows. The
ServiceManager/ExecutorController is what a Manager communicates with
on a worker node.. This Controller will fire up and manage appdomains
based upon a number of Managers and then start an executor in that
appdomain. These 'executors' objects are based upon MarshalByRef
objects that have configurable lease lifetime on them. The Controller
acts as a bridge between the 'Manager' and the 'Executor' routing all
calls to the 'Executor'. Everytime communications happen between the
two the lease lifetime is extended. If a Manager drops off line or
communications is cut for whatever reason the lease lifetime will
expire for that 'Executor' and the Controller will be notified by a
delegate and then clean up the 'abandoned' appdomain. The can also be
initiated by the Manager when it is done executing is work. As I see
it it is just another layer of abstraction between the Manager and
Executor that allows for a little more robustness.
I think I helped Tibor out with the threading issues he was facing.
Tibor, did that work for you?
3.) Krishna, I was thinking of a little longer caching lifetime for
dlls. So lets say one day a Manager1 needs AppA to be executed on the
grid. You have to push the dlls of that AppA down to each worker
node. You finish your work for that day, the manager notifies the
worker nodes that it no longer needs their services and they clean up
any executable payload that was pushed to them. Next day Manager2
needs AppA to be executed on the grid. Follow the same exact steps as
the first day. Now shorten the time to 12 hours, 1 hour, 1 minute. A
lot of redundant bytes could be flying around the grids topology. If
we could cache the Apps being pushed around on the worker nodes and
have a manager check to see if it is on a worker node before pushing
it it would make it less network intensive. Basically all managers
push an app to the controller on a worker node if it doesn't already
reside there. Then when the 'Executor' is loading up the App for the
'Manager' it is pulled from this central repository, which is in
effect mutliple folders, one per app, and copies them to a shadow
directory which the 'Executors' appdomains path points to. This alows
for multiple versions of the same app to be run side by side in
different app domains.
4.) Security. Krishna I agree with everything you said. You would
want exactly that level of control of security.
5.) Krishna, if I am reading between the lines correctly it would
almost seem that you are talking about some sort of P2P overlay
topology for the grid. If this correct I think that it is a fantastic
idea. I have been involved in a couple of P2P apps and I would be
glad to lend a hand with implementation for Alchemi. This would allow
for clustering of resources. It would also allow for pushing through
firewalls and all manner of network nastiness that can happen. But,
like I alluded to above, I'll be busy until probably mid-May getting
settled in with my new employer. After that I would be happy to
contribute to the project.
Have a great day,
John
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
alchemi-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/alchemi-users