Re: [gt-user] possible bug in RFT in 4.0.6

feller Wed, 27 Feb 2008 12:15:38 -0800

Adam,

i tried it and i can't reproduce it.
Just for sanity: please try the following:
use the following job description (does some kind of dummy-staging:
a staging from headnode to headnode (gram-host == rft-host ==
gridftp-host)).
Does that work?
Does myEcho exist in $GLOBUS_USER_HOME afterwards?
Can you execute it?


Martin

job description (replace host and port values and path to "echo"
as needed):

<job>
    <executable>${GLOBUS_USER_HOME}/myEcho</executable>
    <argument>whatTheHeck</argument>
    <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
    <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
    <fileStageIn>
        <transfer>
            <sourceUrl>
               gsiftp://gridftp-host:port/bin/echo
            </sourceUrl>
            <destinationUrl>
               gsiftp://gridftp-host:port/${GLOBUS_USER_HOME}/myEcho
            </destinationUrl>
        </transfer>
    </fileStageIn>
</job>


> Hi,
>
> Just wanted to report a possible bug.  We recently upgraded two Globus
> installations (4.0.3->4.0.6, and 4.0.4->4.0.6), and I noticed that when
> GRAM
> jobs were submitted, the jobs would fail.  I checked and all the files
> seem
> to get staged in properly (executable & input files), but the executable
> segfaults when you try to run it by hand.  I have tried this with multiple
> different executables, and I get the same result each time.  If I manually
> SCP or even globus-url-copy the executable over, it runs just fine.
> However, if I use rft... -h <service_host> to move the executable over, as
> would happen in a GRAM job submission, I end up with a corrupted binary.
> As
> a sanity check, we rolled back to 4.0.4 and everything works just fine.
> So,
> as far as the corruption... diff says the binaries are different:
>
> seil:whatever$ diff garli garli_broken
> Binary files garli and garli_broken differ
>
> if i run gdb on the broken one, I get tons of these messages:
>
> BFD:
> /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
> invalid string offset 1811940244 >= 315067 for section ` .strtab'
> BFD:
> /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
> invalid string offset 2684355476 >= 315067 for section ` .strtab'
> BFD:
> /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
> invalid string offset 2818573205 >= 315067 for section ` .strtab'
>
> followed by:
>
> Dwarf Error: wrong version in compilation unit header (is 0, should be 2)
> [in module /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch
> /whatever/garli_broken]
> Using host libthread_db library "/lib/tls/libthread_db.so.1".
>
> running it, of course, yields a segfault:
>
> (gdb) run
> Starting program:
> /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken
> warning: shared library handler failed to enable breakpoint
>
> Program received signal SIGSEGV, Segmentation fault.
> 0xf6081c1a in ?? ()
>
> So... any ideas?  As you can see I've isolated the problem to an RFT
> transfer using the 4.0.6 container as the RFT service.  The primary reason
> we upgraded in the first place was to avoid a memory leak with the PBS job
> manager, so... if anyone has an idea for a workaround, that would be
> helpful
> too.  We are happy to provide additional information about our setup &
> config if that would be helpful.
>
> Thanks,
> Adam
>

Re: [gt-user] possible bug in RFT in 4.0.6

Reply via email to