Hm, it's very strange. In the logfile for the LSF job I can see that
no single message of the LSF SEG ever enters the Java code, which explains
what we see, but I don't know why this happens.
It's just like either the LSF SEG died (we should see an error in the log in
that situation, though, but there is no error), or the thread that runs the
java code that communicates with the SEG (which is named SchedulerEventGenerator
too) died (we should see an error in the log too, but there is nothing)

(You can turn ws-gram debugging in the server off again)

Please start the server and submit an LSF job.

Then please paste information about the processes of the GT server and the SEGs
("ps -ef | grep -i globus | grep -v grep" should give you that)

Please send me a thread dump of the GT server process.
("kill -QUIT <server-pid>". The output is stored in the server logfile)

Please send the output of "ldd 
$GLOBUS_LOCATION/libexec/globus-scheduler-event-generator

I hope this will tell me more.

Just to make sure:
When you started the SEG manually and saw it printing output, you used the SEG 
from the
same $GLOBUS_LOCATION that is used by the GT4 server we talk about, right?
There is not accidentally another GLOBUS_LOCATION around that might cause some 
confusion?
I remember vaguely that there once was a situation with 2 globus installations 
where the SEG
didn't report anything but I don't remember any details...
Maybe worth trying a clean re-install. Should be relatively quick to do with a 
binary installer.

Thanks,

Martin

Löhnhardt, Benjamin wrote:
> Hi Martin,
> 
> I think I should not send 5MB to the mailing list. So just for you the two
> resulting container.log.
> 
> Regards,
> Benjamin
> 
> --
> 
> Benjamin Löhnhardt
> 
> UNIVERSITÄTSMEDIZIN GÖTTINGEN
> GEORG-AUGUST-UNIVERSITÄT 
> Abteilung Medizinische Informatik
> Robert-Koch-Straße 40
> 37075 Göttingen
> Briefpost 37099 Göttingen
> Telefon +49-551 / 39-22842
> [email protected]
> www.mi.med.uni-goettingen.de
> 
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Martin Feller [mailto:[email protected]]
>> Gesendet: Donnerstag, 5. August 2010 14:39
>> An: Löhnhardt, Benjamin
>> Cc: [email protected]
>> Betreff: Re: AW: [gt-user] globus-ws with lsf does not work
>>
>> Ok, that's odd. Right now I don't have an idea what might go wrong.
>> If you have full control over the GT server, and it's not a production
>> system, please do this:
>>
>> 0. Uncomment the following line in $GLOBUS_LOCATION/container-
>> log4j.properties
>>    # log4j.category.org.globus=DEBUG
>>
>> 1. Shutdown the server
>> 2. Remove the server logfile $GLOBUS_LOCATION/var/container.log
>> 3. Remove the persistence directory
>> ~<userWhoStartsTheContainer>/.globus/persisted
>> 4. Restart the GT server as a daemon (globus-start-container-detached)
>> 5. Submit a simple batch job. No staging, no fileCleanUp please, just
>> something
>>    simple like globusrun-ws -submit -c /bin/date
>> 6. Save the server logfile $GLOBUS_LOCATION/var/container.log
>>
>> Please do steps 1-6 for both a Fork and an LSF job, and send both log
>> files.
>>
>> Martin
>>
>> Löhnhardt, Benjamin wrote:
>>> Hi Martin,
>>>
>>>>> Ah, why to the easy route if there is a complicated one...
>>>>> Somehow I was focused on your statement "... new LSF ..." and
>> thought
>>>> it used
>>>>> to work with old LSF or Fork. So maybe this:
>>> It works fine with Fork. The old LSF is not installed anymore so I
>> cannot
>>> test it. As both variants (Fork and LSF) use the same notification
>> listener
>>> (I guess?), network configuration problems may not be the reason...
>>>
>>>>> To verify that: submit a job in batch/non-interactive mode and
>> store
>>>>> the EPR of the job. Then poll for status.
>>> With LSF the job status remains "unsubmitted":
>>>
>>> -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
>>> https://nimrod.med.uni-goettingen.de -Ft LSF -c /bin/date
>>> Submitting job...Done.
>>> Job ID: uuid:7e99a622-a061-11df-9d58-00215af48192
>>> Termination time: 08/06/2010 07:17 GMT
>>> -bash-3.1$ globusrun-ws -status -j job.epr
>>> Current job state: Unsubmitted
>>>
>>> ...but with Fork it is "done".
>>>
>>> -bash-3.1$ globusrun-ws -submit -b -o job.epr -F
>>> https://nimrod.med.uni-goettingen.de -Ft Fork -c /bin/date
>>> Submitting job...Done.
>>> Job ID: uuid:8b85d1b2-a061-11df-9fb3-00215af48192
>>> Termination time: 08/06/2010 07:17 GMT
>>> -bash-3.1$ globusrun-ws -status -j job.epr
>>> Current job state: Done
>>>
>>> Do you have an explanation for that strange behavior?
>>>
>>> Regards,
>>> Benjamin
>>>
>>> --
>>>
>>> Benjamin Löhnhardt
>>>
>>> UNIVERSITÄTSMEDIZIN GÖTTINGEN
>>> GEORG-AUGUST-UNIVERSITÄT
>>> Abteilung Medizinische Informatik
>>> Robert-Koch-Straße 40
>>> 37075 Göttingen
>>> Briefpost 37099 Göttingen
>>> Telefon +49-551 / 39-22842
>>> [email protected]
>>> www.mi.med.uni-goettingen.de
>>>
>>>
>>>

Reply via email to