Hi Jeff,

  You are right about the memory issue. I replaced the two instances of first 
argument in globus_callback_register_oneshot with NULL as you suggested and SEG 
seem to be working ok after that. I get states reporting as Active and Done 
after that.  Thanks again for your help.

replaced
>
>     result = globus_callback_register_oneshot(
>             &logfile_state->callback,
>             &delay,
>             globus_l_sge_read_callback,
>             logfile_state);
>
with
    result = globus_callback_register_oneshot(
            NULL,
            &delay,
            globus_l_sge_read_callback,
            logfile_state);

Prakashan




-----Original Message-----
From: Jeff Porter [mailto:[EMAIL PROTECTED]
Sent: Fri 11/7/2008 10:18 AM
To: Korambath, Prakashan
Cc: [EMAIL PROTECTED]; Jin, Kejian; [EMAIL PROTECTED]
Subject: Re: [gt-user] Issues with Globus Tookit 4.2 GRAM and SGE-SEG with SGE  
6.2; job status is always unsubmitted
 

Hi Prakashan,

Thanks for passing this on.   Like I said,  I don't think it's great 
solution (particularly on systems where the reporting file can get big) 
but a simple short term one.

btw: I noticed I had a typo in the vdt link I sent (an "_" instead of a 
"-").  The link is


http://vdt.cs.wisc.edu/software/sge-jobmanager/1.1-p5-1//src/globus_scheduler_event_generator_sge-1.1.tar.gz
 
<http://vdt.cs.wisc.edu/software/sge-jobmanager/1.1-p5-1//src/globus_scheduler_event_generator_sge_1.1.tar.gz>


thanks again,

Jeff



Korambath, Prakashan wrote:
>
> Hi Jeff,
>
>   This is the suggestion from Richard in the SGE mailiing list for the 
> Arco/dbwriter problem.  I suppose we can always tell the 
> $GLOBUS_LOCATION/etc/globus-sge.conf file to look at any log_path we want.
>
> Prakashan
>
>
>
>
> Setup a cron jobs which duplicates the reporting file:
>
>
>   if [ ! -r $SGE_ROOT/$SGE_CELL/common/reporting_for_arco -a
>        ! -r $SGE_ROOT/$SGE_CELL/common/reporting_for_globus ]; then
>
>       # move the current reporting file into a tmp file
>       # qmaster will recreate the reporting file soon
>       mv $SGE_ROOT/$SGE_CELL/common/reporting \
>            $SGE_ROOT/$SGE_CELL/common/reporting.tmp
>
>       # Append the reporting file to the reporting file for globus
>       cat $SGE_ROOT/$SGE_CELL/common/reporting.tmp \
>            >> $SGE_ROOT/$SGE_CELL/common/reporting_for_globus
>
>       # Rename the tmp reporting for, dbwriter will process it
>       mv $SGE_ROOT/$SGE_CELL/common/reporting.tmp \
>            $SGE_ROOT/$SGE_CELL/common/reporting_for_arco
>
>   fi
>
>
> In dbwriter.conf (configuration file or dbwriter) the path to the 
> reporting file
> is defined:
>
> % cat $SGE_ROOT/$SGE_CELL/common/dbwriter.conf
> ...
> #
> # File name of reporting file
> #
> DBWRITER_REPORTING_FILE=$SGE_ROOT/$SGE_CELL/common/reporting
> ...
>
> I hope that the path to the reporting file is not hard coded in globus.
>
> I used such a script already for testing different database system. The
> reporting of one cluster has been processed by two dbwriter instances. 
> Once was
> writing into and postgres database, one was writing into a mysql database.
>
> Richard
>
>
> -----Original Message-----
> From: Jeff Porter [mailto:[EMAIL PROTECTED]
> Sent: Thu 11/6/2008 3:22 PM
> To: Korambath, Prakashan
> Cc: [EMAIL PROTECTED]; Jin, Kejian; [EMAIL PROTECTED]
> Subject: Re: [gt-user] Issues with Globus Tookit 4.2 GRAM and SGE-SEG 
> with SGE  6.2; job status is always unsubmitted
>
>
> Hi Prakashan,
>
> You're right that changing the SGE code might be easier to maintain but
> I never thought of the 2 file solutions as a good one - just a quick
> one.    I did speak with one of the ARCO developers about changing the
> dbwriter but that didn't seem plausible from their end.  The other
> solution that seems more realistic is to have the SEG be able to get
> this information from different sources via some pluggin - e.g. from
> reporting file, arco-db, something even lighter - depending on some flag
> in the globus_sge.conf file.
>
> The seg_pbs_module.c version is quite different since pbs has an
> internal logfile rotation mechanism that the seg understands.   When I
> compare 4.0.8 and 4.2.1 versions of the pbs_module, I only see the one
> change you've noted.
>
> I do know there is one memory leak with the LeSC version that has been
> fixed in the vdt version.  You might making that change. That LeSC
> version contains
>
>     result = globus_callback_register_oneshot(
>             &logfile_state->callback,
>             &delay,
>             globus_l_sge_read_callback,
>             logfile_state);
>
> However, if the 1st argument isn't null, the function makes a copy of
> the memory (it may even try to take ownership of the memory, I don't
> remember right now).  You can compare with the pbs version.  It occurs
> twice in the module but the leak is small.  Perhaps this causes
> additional problems in gt4.2?
>
> You can fix your version or grab the vdt version which includes this fix:
>
> http://vdt.cs.wisc.edu/software/sge-jobmanager/1.1-p5-1//src/globus_scheduler_event_generator_sge_1.1.tar.gz
>
> The vdt version also handles 'reporting' file rotation. It does not have
> the gt4.2 fix you mention here.
>
> Thanks, Jeff
>
>
> Korambath, Prakashan wrote:
> >
> > Hi Jeff,
> >
> > Regarding the Arco/gt4: Isn't it better if someone changes the SGE
> > source code to write an additional file, say seg-reporting or
> > something like that?  I can work with you on that no problem here.  If
> > we can get SGE developers do that then changes will be there in their
> > source code distribution.
> >
> > For the SEG update issue this is what I did:
> >
> >
> > I just modified the file from here
> > http://www.lesc.ic.ac.uk/projects/SGE-GT4.html
> >
> > globus_scheduler_event_generator_sge-1.1.tar.gz
> >
> > I saved the contents of someone else's post several weeks ago because
> > I thought it would be useful to me. 
> >
> > For everybody who's interested:
> > I just had to replace the section
> >
> > **********************************
> > globus_module_descriptor_t
> > globus_scheduler_event_module_ptr =
> > {
> >     "globus_scheduler_event_generator_sge",
> >     globus_l_sge_module_activate,
> >     globus_l_sge_module_deactivate,
> >     NULL,
> >     NULL,
> >     &local_version,
> >     NULL
> > };
> > *********************************
> >
> > in the seg_sge_module.c from the
> > globus_scheduler_event_generator_sge-1.1.tar.gz package with the
> > following:
> >
> > *********************************
> > GlobusExtensionDefineModule(globus_seg_sge) =
> > {
> >     "globus_seg_sge",
> >      globus_l_sge_module_activate,
> >      globus_l_sge_module_deactivate,
> >      NULL,
> >      NULL,
> >      &local_version
> >
> > };
> > **************************************
> >
> > Without the above change I was getting the error below.
> >
> > 2008-11-04T08:06:45.415-08:00 ERROR seg.SchedulerEventGenerator
> > [SEG-sge-Thread,run:230] SEG Terminated with
> > globus_scheduler_event_generator: Invalid module sge: activation failed
> > 2008-11-04T08:06:55.450-08:00 ERROR seg.SchedulerEventGenerator
> > [SEG-sge-Thread,run:230] SEG Terminated with
> > globus_scheduler_event_generator: Invalid module sge: activation failed
> > 2008-11-04T08:07:05.504-08:00 INFO  impl.DefaultIndexService
> > [ServiceThread-60,performDefaultRegistrations:261] 
> > guid=9fceec90-aa8a-11dd-9507-895ddbf3eafc
> > event=org.globus.mds.index.performDefaultRegistrations.end status=0
> > 2008-11-04T08:07:05.505-08:00 ERROR seg.SchedulerEventGenerator
> > [SEG-sge-Thread,run:230] SEG Terminated with
> > globus_scheduler_event_generator: Invalid module sge: activation failed
> >
> >
> > So I modified the seg_sge_module.c file and re-installed the event
> > generator
> >
> > gpt-build --force globus_scheduler_event_generator_sge-1.1.tar.gz 
> gcc64dbg
> >
> > After gpt-postinstall the error went away.  I just compared the new
> > seg_pbs_module.c from GT 4.2 distribution with the seg_sge_module.c
> > from London e-science and is seeing lot of differences.  May be I
> > should rewrite it according to the current seg_pbs_module.c.
> >
> > Prakashan
> >
> >
> > -----Original Message-----
> > From: Jeff Porter [mailto:[EMAIL PROTECTED]
> > Sent: Thu 11/6/2008 1:48 PM
> > To: Korambath, Prakashan
> > Cc: [EMAIL PROTECTED]; Jin, Kejian; [EMAIL PROTECTED]
> > Subject: Re: [gt-user] Issues with Globus Tookit 4.2 GRAM and SGE-SEG
> > with SGE  6.2; job status is always unsubmitted
> >
> >
> > This is odd. The code appears to be missing the 'delivered' line, but
> > that doesn't seem reasonable. You say you made some changes to the
> > seg_sge_module.c file for 4.2 compatibility. Have these changes worked
> > before or is this all new investigations?  I'd like to see what you had
> > to fix. Could you send me you seg_sge_module.c?
> >
> > as for the gt4/ARCO mismatch - I've wanted to find/develop a solution
> > for this problem for a while but haven't been able to devote any time to
> > it.  One simple solution would be to have a small script/daemon read the
> > sge reporting file and create a second file that is read by the
> > dbwriter.  That way the original reporting file is maintained.   Would
> > you like to collaborate on putting together/testing something like that?
> >
> > Thanks, Jeff
> >
> > Korambath, Prakashan wrote:
> > >
> > > Hi Jeff,
> > >
> > > The reporting file looks ok to me.  I just submitted one job and below
> > > is the output.  Do we have another alternative for reporting file if
> > > someone is running Arco's dbwriter?
> > >
> > > Prakashan
> > >
> > >
> > >
> > 
> 1226006078:new_job:1226006078:29:-1:NONE:sge_job_script.20845:ppk:staff::defaultdepartment:sge:1024
> > >
> > 
> 1226006078:job_log:1226006078:pending:29:-1:NONE::ppk:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:new
> > > job
> > >
> > 
> 1226006081:job_log:1226006081:sent:29:0:NONE:t:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:sent
> > > to execd
> > >
> > 
> 1226006081:job_log:1226006081:delivered:29:0:NONE:r:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
> > > received by execd
> > >
> > 
> 1226006092:acct:all.q:grid4.ats.ucla.edu:staff:ppk:sge_job_script.20845:29:sge:0:1226006078:1226006081:1226006091:0:0:10:0.111982:0.059990:0.000000:0:0:0:0:18747:0:0:0.000000:0:0:0:0:219:85:NONE:defaultdepartment:NONE:1:0:0.171972:0.000000:0.000000:NONE:0.000000:NONE:127770624.000000:0:0
> > > 1226006092:job_log:1226006092:finished:29:0:NONE:r:execution
> > >
> > 
> daemon:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
> > > exited
> > >
> > 
> 1226006092:job_log:1226006092:finished:29:0:NONE:r:master:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
> > > waits for schedds deletion
> > >
> > 
> 1226006093:host:grid4.ats.ucla.edu:1226006093:X:cpu=1.200000,np_load_avg=0.150000,mem_free=7214.328125M,virtual_free=15215.441406M
> > >
> > 
> 1226006096:job_log:1226006096:deleted:29:0:NONE:T:scheduler:grid4.ats.ucla.edu:0:1024:1226006078:sge_job_script.20845:ppk:staff::defaultdepartment:sge:job
> > > deleted by schedd
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Jeff Porter [mailto:[EMAIL PROTECTED]
> > > Sent: Thu 11/6/2008 1:12 PM
> > > To: Korambath, Prakashan
> > > Cc: [EMAIL PROTECTED]; Jin, Kejian; [EMAIL PROTECTED]
> > > Subject: Re: [gt-user] Issues with Globus Tookit 4.2 GRAM and SGE-SEG
> > > with SGE  6.2; job status is always unsubmitted
> > >
> > > Hi Prakashan,
> > >
> > > When you run your test with the SEG_SGE_DEBUG level set, what
> > > corresponding entries do you see in the reporting file? either 
> 'tail -f'
> > > the file and or grep on "job_log" and the job id.
> > >
> > > BTW: ARCO's dbwriter does delete the reporting file as it's checkpoint
> > > mechanism so that's still an incompatibility with gt4.
> > >
> > > thanks, Jeff
> > >
> > > Korambath, Prakashan wrote:
> > > >
> > > > Hi,
> > > >
> > > >   I am trying to sort out some issues with Integrating Globus 
> ToolKit
> > > > 4.2 and SGE 6.2 SEG.  Some of the issues have already been 
> answered in
> > > > the mailing list and I have followed those answers and they work
> > > > correctly, but I am having at least couple of issues.
> > > >
> > > > For example command below
> > > >
> > > > 1. globusrun-ws -debug -batch -submit -o job_epr -factory
> > > > "globushostname" -Ft SGE -f sleep.xml
> > > > submits and runs the job ok, but command below
> > > >
> > > >
> > > > 2. globusrun-ws -debug -status -job-epr-file job_epr
> > > >
> > > > This command always return status unsubmitted even when job is long
> > > gone.
> > > >
> > > > Current job state: Unsubmitted
> > > >
> > > > I checked the $SGE_ROOT/$SGE_CELL/common/reporting file and the 
> file.
> > > > I found this file disappearing when SGE's ARCO dbwriter is also
> > > > running.  For testing purpose I stopped the postgresql and stopped
> > > > ARCO from doing anything to that file. So now that file is 
> there, but
> > > > still SEG is not getting updates like pending, finished etc.
> > > > Everything is fine with Fork, so there is some problem with SGE-SEG.
> > > >
> > > > I also set
> > > >
> > > > export SEG_SGE_DEBUG=3 and ran
> > > > /home/globus/gt4.2.1/libexec/globus-scheduler-event-generator -s sge
> > > > -t 1225815907
> > > >
> > > >
> > > > globus_l_sge_split_into_fields()
> > > > globus_l_sge_split_into_fields(): exit success
> > > > New event: job 28 now pending
> > > > freeing fields
> > > > globus_l_sge_parse_events() exits
> > > > globus_l_sge_clean_buffer() called
> > > > globus_l_sge_split_into_fields()
> > > > globus_l_sge_split_into_fields(): exit success
> > > > New event: job 28 now completed
> > > > freeing fields
> > > > globus_l_sge_split_into_fields()
> > > > globus_l_sge_split_into_fields(): exit success
> > > >
> > > >
> > > > So the scheduler event generator seems to get the status.  My
> > > > suspicion is that something is missing in the file seg_sge_module.c.
> > > > I already have changes mentioned here
> > > >
> > >
> > 
> http://www.globus.org/toolkit/docs/4.2/4.2.0/execution/gram4/developer/scheduler-tutorial-seg.html
> > > >
> > > > I wonder what else is missing.
> > > >
> > > >
> > > > Prakashan
> > > >
> > > >
> > > >
> > >
> >
> >
>

Reply via email to