[ 
https://issues.apache.org/jira/browse/TS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553856#comment-13553856
 ] 

Tomasz Kuzemko commented on TS-1648:
------------------------------------

So the patch fixes the segfault, but then ATS falls into an infinite loop doing 
this all the time:


[Jan 15 15:35:00.489] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4c90 tag 0 boffset 0 b 0x7ffb36ff4c90 p (nil) l 1
[Jan 15 15:35:00.489] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e370fb0 tag 0 boffset 0 b 0x7ffe1e370fb0 p (nil) l 1
[Jan 15 15:35:00.489] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4cb8 tag 0 boffset 0 b 0x7ffb36ff4cb8 p (nil) l 1
[Jan 15 15:35:00.489] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e370fd8 tag 0 boffset 0 b 0x7ffe1e370fd8 p (nil) l 1
[Jan 15 15:35:00.489] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4ce0 tag 0 boffset 0 b 0x7ffb36ff4ce0 p (nil) l 1
[Jan 15 15:35:00.489] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371000 tag 0 boffset 0 b 0x7ffe1e371000 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4d08 tag 0 boffset 0 b 0x7ffb36ff4d08 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371028 tag 0 boffset 0 b 0x7ffe1e371028 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4d30 tag 0 boffset 0 b 0x7ffb36ff4d30 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371050 tag 0 boffset 0 b 0x7ffe1e371050 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4d58 tag 0 boffset 0 b 0x7ffb36ff4d58 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371078 tag 0 boffset 0 b 0x7ffe1e371078 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4d80 tag 0 boffset 0 b 0x7ffb36ff4d80 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3710a0 tag 0 boffset 0 b 0x7ffe1e3710a0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4da8 tag 0 boffset 0 b 0x7ffb36ff4da8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3710c8 tag 0 boffset 0 b 0x7ffe1e3710c8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4dd0 tag 0 boffset 0 b 0x7ffb36ff4dd0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3710f0 tag 0 boffset 0 b 0x7ffe1e3710f0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4df8 tag 0 boffset 0 b 0x7ffb36ff4df8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371118 tag 0 boffset 0 b 0x7ffe1e371118 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4e20 tag 0 boffset 0 b 0x7ffb36ff4e20 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371140 tag 0 boffset 0 b 0x7ffe1e371140 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4e48 tag 0 boffset 0 b 0x7ffb36ff4e48 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4e70 tag 0 boffset 0 b 0x7ffb36ff4e70 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371168 tag 0 boffset 0 b 0x7ffe1e371168 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4e98 tag 0 boffset 0 b 0x7ffb36ff4e98 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371190 tag 0 boffset 0 b 0x7ffe1e371190 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4ec0 tag 0 boffset 0 b 0x7ffb36ff4ec0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3711b8 tag 0 boffset 0 b 0x7ffe1e3711b8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4ee8 tag 0 boffset 0 b 0x7ffb36ff4ee8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3711e0 tag 0 boffset 0 b 0x7ffe1e3711e0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4f10 tag 0 boffset 0 b 0x7ffb36ff4f10 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371208 tag 0 boffset 0 b 0x7ffe1e371208 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4f38 tag 0 boffset 0 b 0x7ffb36ff4f38 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371230 tag 0 boffset 0 b 0x7ffe1e371230 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4f60 tag 0 boffset 0 b 0x7ffb36ff4f60 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371258 tag 0 boffset 0 b 0x7ffe1e371258 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4f88 tag 0 boffset 0 b 0x7ffb36ff4f88 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371280 tag 0 boffset 0 b 0x7ffe1e371280 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4fb0 tag 0 boffset 0 b 0x7ffb36ff4fb0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3712a8 tag 0 boffset 0 b 0x7ffe1e3712a8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff20ad700} DEBUG: (dir_clean) cleaning 
0x7ffb36ff4fd8 tag 0 boffset 0 b 0x7ffb36ff4fd8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3712d0 tag 0 boffset 0 b 0x7ffe1e3712d0 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3712f8 tag 0 boffset 0 b 0x7ffe1e3712f8 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371320 tag 0 boffset 0 b 0x7ffe1e371320 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371348 tag 0 boffset 0 b 0x7ffe1e371348 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371370 tag 0 boffset 0 b 0x7ffe1e371370 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e371398 tag 0 boffset 0 b 0x7ffe1e371398 p (nil) l 1
[Jan 15 15:35:00.490] Server {0x7ffff22af700} DEBUG: (dir_clean) cleaning 
0x7ffe1e3713c0 tag 0 boffset 0 b 0x7ffe1e3713c0 p (nil) l 1
...

                
> Segmentation fault in dir_clear_range()
> ---------------------------------------
>
>                 Key: TS-1648
>                 URL: https://issues.apache.org/jira/browse/TS-1648
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 3.3.0, 3.2.0
>         Environment: reverse proxy
>            Reporter: Tomasz Kuzemko
>            Assignee: weijin
>         Attachments: 
> 0001-Fix-for-TS-1648-Segmentation-fault-in-dir_clear_rang.patch
>
>
> I use ATS as a reverse proxy. I have a fairly large disk cache consisting of 
> 2x 10TB raw disks. I do not use cache compression. After a few days of 
> running (this is a dev machine - not handling any traffic) ATS begins to 
> crash with a segfault shortly after start:
> [Jan 11 16:11:00.690] Server {0x7ffff2bb8700} DEBUG: (rusage) took rusage 
> snap 1357917060690487000
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff20ad700 (LWP 17292)]
> 0x0000000000696a71 in dir_clear_range (start=640, end=17024, vol=0x16057d0) 
> at CacheDir.cc:382
> 382   CacheDir.cc: No such file or directory.
>       in CacheDir.cc
> (gdb) p i
> $1 = 214748365
> (gdb) l
> 377   in CacheDir.cc
> (gdb) p dir_index(vol, i)
> $2 = (Dir *) 0x7ff997a04002
> (gdb) p dir_index(vol, i-1)
> $3 = (Dir *) 0x7ffa97a03ff8
> (gdb) p *dir_index(vol, i-1)
> $4 = {w = {0, 0, 0, 0, 0}}
> (gdb) p *dir_index(vol, i-2)
> $5 = {w = {0, 0, 52431, 52423, 0}}
> (gdb) p *dir_index(vol, i)
> Cannot access memory at address 0x7ff997a04002
> (gdb) p *dir_index(vol, i+2)
> Cannot access memory at address 0x7ff997a04016
> (gdb) p *dir_index(vol, i+1)
> Cannot access memory at address 0x7ff997a0400c
> (gdb) p vol->buckets * DIR_DEPTH * vol->segments
> $6 = 1246953472
> (gdb) bt
> #0  0x0000000000696a71 in dir_clear_range (start=640, end=17024, 
> vol=0x16057d0) at CacheDir.cc:382
> #1  0x000000000068aba2 in Vol::handle_recover_from_data (this=0x16057d0, 
> event=3900, data=0x16058a0) at Cache.cc:1384
> #2  0x00000000004e8e1c in Continuation::handleEvent (this=0x16057d0, 
> event=3900, data=0x16058a0) at ../iocore/eventsystem/I_Continuation.h:146
> #3  0x0000000000692385 in AIOCallbackInternal::io_complete (this=0x16058a0, 
> event=1, data=0x135afc0) at ../../iocore/aio/P_AIO.h:80
> #4  0x00000000004e8e1c in Continuation::handleEvent (this=0x16058a0, event=1, 
> data=0x135afc0) at ../iocore/eventsystem/I_Continuation.h:146
> #5  0x0000000000700fec in EThread::process_event (this=0x7ffff36c4010, 
> e=0x135afc0, calling_code=1) at UnixEThread.cc:142
> #6  0x00000000007011ff in EThread::execute (this=0x7ffff36c4010) at 
> UnixEThread.cc:191
> #7  0x00000000006ff8c2 in spawn_thread_internal (a=0x1356040) at Thread.cc:88
> #8  0x00007ffff797e8ca in start_thread () from /lib/libpthread.so.0
> #9  0x00007ffff55c6b6d in clone () from /lib/libc.so.6
> #10 0x0000000000000000 in ?? ()
> This is fixed by running "traffic_server -Kk" to clear the cache. But after a 
> few days the issue reappears.
> I will keep the current faulty setup as-is in case you need me to provide 
> more data. I tried to make a core dump but it took a couple of GB even after 
> gzip (I can however provide it on request).
> *Edit*
> OS is Debian GNU/Linux 6.0.6 with custom built kernel 
> 3.2.13-grsec-xxxx-grs-ipv6-64

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to