[gt-user] possible bug in RFT in 4.0.6

Adam Bazinet Wed, 27 Feb 2008 10:41:26 -0800

Hi,

Just wanted to report a possible bug.  We recently upgraded two Globus
installations (4.0.3->4.0.6, and 4.0.4->4.0.6), and I noticed that when GRAM
jobs were submitted, the jobs would fail.  I checked and all the files seem
to get staged in properly (executable & input files), but the executable
segfaults when you try to run it by hand.  I have tried this with multiple
different executables, and I get the same result each time.  If I manually
SCP or even globus-url-copy the executable over, it runs just fine.
However, if I use rft... -h <service_host> to move the executable over, as
would happen in a GRAM job submission, I end up with a corrupted binary.  As
a sanity check, we rolled back to 4.0.4 and everything works just fine.  So,
as far as the corruption... diff says the binaries are different:


seil:whatever$ diff garli garli_broken
Binary files garli and garli_broken differ

if i run gdb on the broken one, I get tons of these messages:

BFD:
/a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
invalid string offset 1811940244 >= 315067 for section ` .strtab'
BFD:
/a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
invalid string offset 2684355476 >= 315067 for section ` .strtab'
BFD:
/a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken:
invalid string offset 2818573205 >= 315067 for section ` .strtab'

followed by:

Dwarf Error: wrong version in compilation unit header (is 0, should be 2)
[in module /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch
/whatever/garli_broken]
Using host libthread_db library "/lib/tls/libthread_db.so.1".

running it, of course, yields a segfault:

(gdb) run
Starting program:
/a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken
warning: shared library handler failed to enable breakpoint

Program received signal SIGSEGV, Segmentation fault.
0xf6081c1a in ?? ()

So... any ideas?  As you can see I've isolated the problem to an RFT
transfer using the 4.0.6 container as the RFT service.  The primary reason
we upgraded in the first place was to avoid a memory leak with the PBS job
manager, so... if anyone has an idea for a workaround, that would be helpful
too.  We are happy to provide additional information about our setup &
config if that would be helpful.

Thanks,
Adam

[gt-user] possible bug in RFT in 4.0.6

Reply via email to