This is odd. The code appears to be missing the 'delivered' line, but
that doesn't seem reasonable. You say you made some changes to the
seg_sge_module.c file for 4.2 compatibility. Have these changes worked
before or is this all new investigations? I'd like to see what you had
to fix. Could you send me you seg_sge_module.c?
as for the gt4/ARCO mismatch - I've wanted to find/develop a solution
for this problem for a while but haven't been able to devote any time to
it. One simple solution would be to have a small script/daemon read the
sge reporting file and create a second file that is read by the
dbwriter. That way the original reporting file is maintained. Would
you like to collaborate on putting together/testing something like that?
Thanks, Jeff
Korambath, Prakashan wrote:
Hi Jeff,
The reporting file looks ok to me. I just submitted one job and below
is the output. Do we have another alternative for reporting file if
someone is running Arco's dbwriter?
Prakashan
1226006078:new_job:1226006078:29:-1:NONE:sge_job_script.20845:ppk:staff::defaultdepartment:sge:1024
1226006078:job_log:1226006078:pending:29:-1:NONE::ppk:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:new
job
1226006081:job_log:1226006081:sent:29:0:NONE:t:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:sent
to execd
1226006081:job_log:1226006081:delivered:29:0:NONE:r:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
received by execd
1226006092:acct:all.q:grid4.ats.ucla.edu:staff:ppk:sge_job_script.20845:29:sge:0:1226006078:1226006081:1226006091:0:0:10:0.111982:0.059990:0.000000:0:0:0:0:18747:0:0:0.000000:0:0:0:0:219:85:NONE:defaultdepartment:NONE:1:0:0.171972:0.000000:0.000000:NONE:0.000000:NONE:127770624.000000:0:0
1226006092:job_log:1226006092:finished:29:0:NONE:r:execution
daemon:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
exited
1226006092:job_log:1226006092:finished:29:0:NONE:r:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
waits for schedds deletion
1226006093:host:grid4.ats.ucla.edu:1226006093:X:cpu=1.200000,np_load_avg=0.150000,mem_free=7214.328125M,virtual_free=15215.441406M
1226006096:job_log:1226006096:deleted:29:0:NONE:T:scheduler:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
deleted by schedd
-----Original Message-----
From: Jeff Porter [mailto:[EMAIL PROTECTED]
Sent: Thu 11/6/2008 1:12 PM
To: Korambath, Prakashan
Cc: [EMAIL PROTECTED]; Jin, Kejian; [EMAIL PROTECTED]
Subject: Re: [gt-user] Issues with Globus Tookit 4.2 GRAM and SGE-SEG
with SGE 6.2; job status is always unsubmitted
Hi Prakashan,
When you run your test with the SEG_SGE_DEBUG level set, what
corresponding entries do you see in the reporting file? either 'tail -f'
the file and or grep on "job_log" and the job id.
BTW: ARCO's dbwriter does delete the reporting file as it's checkpoint
mechanism so that's still an incompatibility with gt4.
thanks, Jeff
Korambath, Prakashan wrote:
>
> Hi,
>
> I am trying to sort out some issues with Integrating Globus ToolKit
> 4.2 and SGE 6.2 SEG. Some of the issues have already been answered in
> the mailing list and I have followed those answers and they work
> correctly, but I am having at least couple of issues.
>
> For example command below
>
> 1. globusrun-ws -debug -batch -submit -o job_epr -factory
> "globushostname" -Ft SGE -f sleep.xml
> submits and runs the job ok, but command below
>
>
> 2. globusrun-ws -debug -status -job-epr-file job_epr
>
> This command always return status unsubmitted even when job is long
gone.
>
> Current job state: Unsubmitted
>
> I checked the $SGE_ROOT/$SGE_CELL/common/reporting file and the file.
> I found this file disappearing when SGE's ARCO dbwriter is also
> running. For testing purpose I stopped the postgresql and stopped
> ARCO from doing anything to that file. So now that file is there, but
> still SEG is not getting updates like pending, finished etc.
> Everything is fine with Fork, so there is some problem with SGE-SEG.
>
> I also set
>
> export SEG_SGE_DEBUG=3 and ran
> /home/globus/gt4.2.1/libexec/globus-scheduler-event-generator -s sge
> -t 1225815907
>
>
> globus_l_sge_split_into_fields()
> globus_l_sge_split_into_fields(): exit success
> New event: job 28 now pending
> freeing fields
> globus_l_sge_parse_events() exits
> globus_l_sge_clean_buffer() called
> globus_l_sge_split_into_fields()
> globus_l_sge_split_into_fields(): exit success
> New event: job 28 now completed
> freeing fields
> globus_l_sge_split_into_fields()
> globus_l_sge_split_into_fields(): exit success
>
>
> So the scheduler event generator seems to get the status. My
> suspicion is that something is missing in the file seg_sge_module.c.
> I already have changes mentioned here
>
http://www.globus.org/toolkit/docs/4.2/4.2.0/execution/gram4/developer/scheduler-tutorial-seg.html
>
> I wonder what else is missing.
>
>
> Prakashan
>
>
>