Re: [gt-user] possible bug in RFT in 4.0.6

Ravi Madduri Wed, 27 Feb 2008 12:03:36 -0800

Do you see any errors in container logs ? This does seem a bit weird.
On Feb 27, 2008, at 12:41 PM, Adam Bazinet wrote:

Hi,
Just wanted to report a possible bug. We recently upgraded twoGlobus installations (4.0.3->4.0.6, and 4.0.4->4.0.6), and I noticedthat when GRAM jobs were submitted, the jobs would fail. I checkedand all the files seem to get staged in properly (executable & inputfiles), but the executable segfaults when you try to run it byhand. I have tried this with multiple different executables, and Iget the same result each time. If I manually SCP or even globus-url-copy the executable over, it runs just fine. However, if I userft... -h <service_host> to move the executable over, as wouldhappen in a GRAM job submission, I end up with a corrupted binary.As a sanity check, we rolled back to 4.0.4 and everything works justfine. So, as far as the corruption... diff says the binaries aredifferent:
seil:whatever$ diff garli garli_broken
Binary files garli and garli_broken differ

if i run gdb on the broken one, I get tons of these messages:
BFD: /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken: invalid string offset 1811940244 >= 315067for section ` .strtab'BFD: /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken: invalid string offset 2684355476 >= 315067for section ` .strtab'BFD: /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken: invalid string offset 2818573205 >= 315067for section ` .strtab'
followed by:
Dwarf Error: wrong version in compilation unit header (is 0, shouldbe 2) [in module /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch /whatever/garli_broken]
Using host libthread_db library "/lib/tls/libthread_db.so.1".

running it, of course, yields a segfault:

(gdb) run
Starting program: /a/storage.seil.umd.edu/export/home/seil/globus/.globus/scratch/whatever/garli_broken
warning: shared library handler failed to enable breakpoint

Program received signal SIGSEGV, Segmentation fault.
0xf6081c1a in ?? ()
So... any ideas? As you can see I've isolated the problem to an RFTtransfer using the 4.0.6 container as the RFT service. The primaryreason we upgraded in the first place was to avoid a memory leakwith the PBS job manager, so... if anyone has an idea for aworkaround, that would be helpful too. We are happy to provideadditional information about our setup & config if that would behelpful.
Thanks,
Adam


--
Ravi K Madduri

The Globus Alliance | Argonne National Laboratory | University ofChicago

http://www-unix.mcs.anl.gov/~madduri

Re: [gt-user] possible bug in RFT in 4.0.6

Reply via email to