On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers. For example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue wait
times of 6 hours on average over a 1 year period. So, having some extra
latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more.
This might be write for your usecase. However, there are also other usecases
around in the grid world. We are running [EMAIL PROTECTED] as a task farming
application on the ressourece of D-Grid, and we consume per day about 100000
CPU hours. So it is really a productive application. Because we are
submitting hundred of jobs, the latency cannot be neglected, and it wold be
really helpful to reduce it to a time below 1 second. If you're looking into
the net traffic caused by globusrun-ws -submit, you can see thereare a lot of
communication cicles (I think it are 9) between the submitting and the
execution host. Is this really necessary? SOAP only requires one...
Alexander:
The number of communication cycles depends on how you are using globusrun-ws.
E.g. the simple looking command "globusrun-ws -submit -s -c /bin/true" is so
expensive
because of the streaming of output. The performed steps are:
1. Get the values of the delegation endpoint resource properties in the server
2. Delegate credentials
3. Submit the job
4. Get the values of the RP's stdoutUrl and stderrUrl to find the URL's of the
files that are
involved in streaming
5. Use gridftp to stream the output to the client
6. Release the job from CleanupHold once streaming is done
7. In 4.0: Destroy the subscription resource
8. Destroy the job resource once the Done/Failed notification was received
9. Destroy the delegated credentials
This does not include the notification messages sent by the server.
Also, internally RFT is used to remove the files that are used to store the
output
that is streamed back to the client
If you don't need streaming the steps 1,2,4,5,6,9 are not performed, and no
fileCleanUp
has to be done in Gram.
If you can share a credential amongst jobs, no credential would have to be
delegated and destroyed per job.
Things you could think about:
* Can you use shared delegation?
* Do you need streaming, or can you stage the results out in a separate step?
* Polling for job status is more effective than notifications.
Can you query instead of subscribing for notifications?
And finally: Using globusrun-ws is maybe not the best approach to submit 100s
of jobs.
We don't provide a client for large job submissions ourselves. Is Condor-G an
option?
Writing an effective client for large job submissions that suits your use-case
might be worth doing, but i guess it won't be trivial.
If you are interested in this i could point you to a testing program i recently
finished for 4.2 to be able to run all kind of submission scenarios against
Gram.
That might be a starting point.
Martin
So please note: there is no "real Grid deployment" in that way, you've
mentioned it. I think this problem will get still more bothersome, if a
scheduler as e.g. Gridway is coming into the game.
Cheers
Alexander