Re: [gt-user] Condor-g problems

Charles Bacon Mon, 26 Nov 2007 08:40:45 -0800

On Nov 26, 2007, at 10:28 AM, scott fletcher (BITS) wrote:

Thanks For the info Charles,

If you set up the more recent implementation of
GRAM, which also works with Condor, you will get near-instantaneous
notification of job completion.


Do you mean just specifying a different GRAM (eg gt4) in the job
submission file or is there some extra setup that is required ?

You'd have to setup the GRAM4 server. That boils down to setting upa backend database for RFT, standing up a GridFTP server for stage-in/out, and starting the webservices container. That's sections 2.4-2.7in the quickstart: http://www.globus.org/toolkit/docs/4.0/admin/docbook/quickstart.html#q-gridftp

Then, yes, you just change your gt2 to a gt4 in the condor submitscript, and change your contact from the gatekeeper on port 2119 tohttps://hostname.whatever:8443/wsrf/services/ManagedJobFactoryService

If your install of GT was already setup for submission to Condor, thewebservices condor stuff should all be setup already too.



Charles


Thanks,

Scott

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Charles Bacon
Sent: 26 November 2007 15:10
To: scott fletcher (BITS)
Cc: [email protected]
Subject: Re: [gt-user] Condor-g problems

On Nov 26, 2007, at 5:59 AM, scott fletcher (BITS) wrote:

Problem 1
=========

...


At this point even if we revert to submitting jobs directly to Condor
we get the same message, the only thing that seems to fix it is a
reboot.


I don't have an idea about this one, and I suspect you'll have better

luck with it in a Condor forum. I am surprised, however, because Iknow

that in their architecture there's a separate daemon called the GAHP
they use to offload their interactions with things like Globus into a

separate daemon. The only thing I can think to suggest is to lookif a

GAHP is up and running at the time you experience this problem and try
killing it.

Problem 2
=========
When we submit a job to the master node it gets there and runs as you
would expect and then exits, however on the submission node the job
appears idle until about a minute after the job has actually finished

(on short jobs lasting 10 secs, we have not really tried any longones

yet), it then shows status as running (which takes several times
the job
actually took to run) and then exits.


This has to do with the architecture of GRAM2.  It polls for job
completion, and does so at a one minute interval.  Condor-G is
meddles with it to try to improve things, which I believe is the
poll_fast output you're seeing.  It sounds like the poll_fast isn't
speeding things up, and you're instead getting the default one-minute
interval polling.  If you set up the more recent implementation of
GRAM, which also works with Condor, you will get near-instantaneous
notification of job completion.


Charles


--

Disclaimer: This e-mail and any attachments are confidential andintended solely for the use of the recipient(s) to whom they areaddressed. If you have received it in error, please destroy allcopies and inform the sender. This email and any attachments arebelieved to be free from viruses but BBSRC accepts no liability inconnection therewith.

Re: [gt-user] Condor-g problems

Reply via email to