hello again! After doing some more research, we encountered that we keep getting the following debug-messages in our container.log:
2007-06-13 13:45:30,009 DEBUG ManagedExecutableJobResource.5f9ea370-78d9-11db-b934-9189e81827f7 [Thread-3,remove:296] Waiting to be Done or Failed. Current state: FailureFileCleanUpResponse i added the debug-flag to the command mentioned below (globusrun-ws -submit -dbg -F hydra -s -c /bin/hostname) and i get messages like this (till i cancel the job): ... debug: operation complete Canceling...debug: starting to get gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stderr debug: sending command: ERET P 0 65536 /home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stderr debug: response from gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stderr: 125 Begining transfer; reusing existing data connection. debug: reading into data buffer 0x812c480, maximum length 65536 debug: data callback, no error, buffer 0x812c480, length 0, offset=0, eof=true debug: response from gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stderr: 226 Transfer Complete. debug: operation complete debug: starting to get gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stdout debug: sending command: ERET P 6 65536 /home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stdout debug: response from gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stdout: 125 Begining transfer; reusing existing data connection. debug: reading into data buffer 0x812c480, maximum length 65536 debug: data callback, no error, buffer 0x812c480, length 0, offset=6, eof=true debug: response from gsiftp://hydra.gup.uni-linz.ac.at:2811/home/local/agrid/agp11092/dc6d8dec-19a3-11dc-9b58-0002a5e72f21.0.stdout: 226 Transfer Complete. ... it's definitely not a problem on the client-side because when i execute exactly the same command (except changing the factory contact to altix1) on the same host, with the same credentials... the command runs and terminates as expected. We are kinda out of ideas so if anybody could give us some directions we would be really grateful! Regards, Christoph Spielmann Christoph Spielmann wrote: > Hi everybody! > > After about a week of trial and error debugging work i decided to have > some experts look at my problem. ;) > > > First of all some background: We installed GT 4.0.4 on one of our > clusters and as far as i could remember globusrun-ws (i was asking one > of my collegue who was helping me with the installation but he wasn't > sure himself) was working when we first tried it. After a while we > remarked that some other components of GT weren't working as expected > and we figured the problem was the fact that we run gpt-postinstall as > root and not as user globus while we were installing it, so we run > gpt-postinstall again as user 'globus'. > > Now every time we try to use globusrun-ws it hangs. globus-job-run on > the other hand works perfectly. > > Here's the output when i try to run a job with globusrun-ws: > > globusrun-ws -submit -F hydra -s -c /bin/hostname > Delegating user credentials...Done. > Submitting job...Done. > Job ID: uuid:1f363be0-182a-11dc-8b33-0002a5e72f21 > Termination time: 06/12/2007 14:43 GMT > Current job state: Active > hydra <-- it hangs here till i cancel the job. > > After cancelling the job it continues and i get this: > > Canceling...Canceled. > Destroying job...Done. > Cleaning up any delegated credentials...Done. > globusrun-ws: Operation was canceled > > the output of globus-gatekeeper.log > > TIME: Mon Jun 11 16:38:17 2007 > PID: 24648 -- Notice: 6: Got connection 140.78.104.101 at Mon Jun 11 > 16:38:17 2007 > > Failed reading length 0 > GSS authentication failure > globus_gss_assist token :3: read failure: Connection closed > Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003 > > TIME: Mon Jun 11 16:38:17 2007 > PID: 24648 -- Failure: GSS failed Major:01090000 Minor:00000000 > Token:00000003 > > I'll attach the container.log to this mail because it's a rather big file! > > Does anybody have a clue what could be wrong here? > > Regards, > > Christoph Spielmann > >
