I have come across some odd results regarding the sort utility in coreutils 
version 8.20.  I've looked through the archives and don't see any similar 
issues so it may be something specific to our systems.

System:  SunOS 5.10 Generic_147440-26 sun4u sparc SUNW,Sun-Fire-V890

Issue:  When running sort on a 22.5 GB file I found that about 1 out of 10 
times the process seems to hang (out of 100+ tests).  The process is still 
running but the temp files are no longer changing and the final file either has 
not been created or is a 0 byte file.  When this happens the temp files are 
never in the exact same state as a previous run.  On this machine a complete 
sort normally takes about 20 minutes.  On one occasion the process hung for 
over 48 hours before I killed it.  Running top shows no significant load on the 
system.


Command run:

./sort -t\n -S 256M --batch-size=100 -T /disk/craiwk01/prod/SORTWK -T 
/disk/craiwk02/prod/SORTWK -T /disk/craiwk03/prod/SORTWK -T 
/disk/craiwk04/prod/SORTWK -T /disk/craiwk06/prod/SORTWK -k1.1,1.10 infile -o 
infile.sorted



>: ps

   PID TTY         TIME CMD

16328 pts/3       5:06 sort

        12697 pts/3       0:00 ps



>: sudo truss -rall -wall -f -p 16328

16328:  lwp_park(0x00000000, 0)         (sleeping...)


>: sudo pstack 16328

16328:  /usr/local/abacus/etsort/sort -tn -S 295063 --batch-size=100 -T /disk/

-----------------  lwp# 1 / thread# 1  --------------------

ffffffff7d4d8818 lwp_park (0, 0, 0)

0000000100009c74 sortlines (111b56580, 111c56080, ffffffff7fffeab0, 10012a321, 
ffffffff7fffead0, 10012a328) + 514

000000010000a5cc sortlines (111558380, 2, ffffffff7fffeab0, 1121765e0, 0, 
ffffffff7fffeab0) + e6c

000000010000a5cc sortlines (111956f80, 4, ffffffff7fffeab0, 112176420, 0, 
ffffffff7fffeab0) + e6c

000000010000a5cc sortlines (112154760, 8, ffffffff7fffeab0, 1121760a0, 1, 
ffffffff7fffeab0) + e6c

000000010000c070 sort (10012a740, 0, ffffffff7fffead0, 23, 10012cddd, 
112154760) + 350

000000010000e6e8 main (13, ffffffff7ffff148, 0, 10012c220, fffd, 10012b1e0) + 
1ee8

00000001000041bc _start (0, 0, 0, 0, 0, 0) + 7c

-----------------  lwp# 240 / thread# 240  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

-----------------  lwp# 241 / thread# 241  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

-----------------  lwp# 242 / thread# 242  --------------------

000000010000a600 sortlines_thread(), exit value = 0x0000000000000000

        ** zombie (exited, not detached, not yet joined) **

If I change the sort to run as a single threaded process (add "--parallel=1" to 
above command) then it doesn't hang.  This makes me think that it's most likely 
a threading issue.  I ran the same tests on a LINUX machine and it did not have 
the same hanging issue so it's most likely only an issue with Solaris.

I initially found this issue using coreutils 8.9 and I changed to 8.20 to see 
if a fix had been made but no luck.

Is this a known issue?  Are there any additional tests I should run to further 
narrow down this issue?

Thanks,

Jeff

________________________________

This e-mail and files transmitted with it are confidential, and are intended 
solely for the use of the individual or entity to whom this e-mail is 
addressed. If you are not the intended recipient, or the employee or agent 
responsible to deliver it to the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you are not one of the named recipient(s) or otherwise 
have reason to believe that you received this message in error, please 
immediately notify sender by e-mail, and destroy the original message. Thank 
You.

Reply via email to