Re: [gt-user] excessive latency

Martin Feller Tue, 22 Jul 2008 06:31:14 -0700

On Sonntag, 20. Juli 2008 18:19:57 Ioan Raicu wrote:
Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers.  For example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue wait
times of 6 hours on average over a 1 year period.  So, having some extra
latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more.
This might be write for your usecase. However, there are also other usecasesaround in the grid world. We are running [EMAIL PROTECTED] as a task farmingapplication on the ressourece of D-Grid, and we consume per day about 100000CPU hours. So it is really a productive application. Because we aresubmitting hundred of jobs, the latency cannot be neglected, and it wold bereally helpful to reduce it to a time below 1 second. If you're looking intothe net traffic caused by globusrun-ws -submit, you can see thereare a lot ofcommunication cicles (I think it are 9) between the submitting and theexecution host. Is this really necessary? SOAP only requires one...


Alexander:

The number of communication cycles depends on how you are using globusrun-ws.

E.g. the simple looking command "globusrun-ws -submit -s -c /bin/true" is so 
expensive
because of the streaming of output. The performed steps are:

1. Get the values of the delegation endpoint resource properties in the server
2. Delegate credentials
3. Submit the job
4. Get the values of the RP's stdoutUrl and stderrUrl to find the URL's of the 
files that are
   involved in streaming
5. Use gridftp to stream the output to the client
6. Release the job from CleanupHold once streaming is done
7. In 4.0: Destroy the subscription resource
8. Destroy the job resource once the Done/Failed notification was received
9. Destroy the delegated credentials

This does not include the notification messages sent by the server.
Also, internally RFT is used to remove the files that are used to store the 
output
that is streamed back to the client

If you don't need streaming the steps 1,2,4,5,6,9 are not performed, and no 
fileCleanUp
has to be done in Gram.

If you can share a credential amongst jobs, no credential would have to be
delegated and destroyed per job.

Things you could think about:
* Can you use shared delegation?
* Do you need streaming, or can you stage the results out in a separate step?
* Polling for job status is more effective than notifications.
  Can you query instead of subscribing for notifications?

And finally: Using globusrun-ws is maybe not the best approach to submit 100s 
of jobs.
We don't provide a client for large job submissions ourselves. Is Condor-G an 
option?

Writing an effective client for large job submissions that suits your use-case
might be worth doing, but i guess it won't be trivial.
If you are interested in this i could point you to a testing program i recently
finished for 4.2 to be able to run all kind of submission scenarios against 
Gram.
That might be a starting point.

Martin

So please note: there is no "real Grid deployment" in that way, you'vementioned it. I think this problem will get still more bothersome, if ascheduler as e.g. Gridway is coming into the game.
Cheers

Alexander

Re: [gt-user] excessive latency

Reply via email to