[
https://issues.apache.org/jira/browse/TS-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118126#comment-13118126
]
mohan_zl edited comment on TS-971 at 9/30/11 3:21 PM:
------------------------------------------------------
Understand the cause of this bug, the patch has fixed it. Now as the TS-970
and TS-971 are both fixed, the cache feature can works well.
The bug is caused in this way: in Vol::aggWrite, the line "io.thread =
AIO_CALLBACK_THREAD_AIO" cause ET_AIO thread directly call continuation handler
"Vol::aggWriteDone", and the MUTEX_LOCK(AIO.cc) make vol->mutex->thread_holding
is the current ET_AIO thread, which is a DEDICATED thread. If you enable cache
evacuate, then something error happens: CacheVC::evacuateDocDone will call
do_read_call, which call CacheVC::handleRead, and in CacheVC::handleRead, line
"io.thread = mutex->thread_holding" will use current ET_AIO thread to do
asynchronous io, but ET_AIO is a DEDICATED thread, it will neeeeever do this
action, and the ink_assert macro in EThread::schedule_imm_signal(AIO.cc)will
cause ats crash.
I think the division of every thread's labor is not very clear in ats codes,
for example, the aio thread is a DEDICATED thread, so if you wanna use it to
call continuation handler, which should be done by REGULAR thread, some bug
will happen. Besides, the epoll_wait should be done by a special thread like
ET_POLL thread, not ET_NET thread, which is in actually a worker thread, isn't
it?
was (Author: wahu0315210):
Understand the cause of this bug, the patch has fixed it. Now as the TS-970
and TS-971 are both fixed, the cache feature can works well.()The bug is caused
in this way: in Vol::aggWrite, the line "io.thread = AIO_CALLBACK_THREAD_AIO"
cause AIO thread directly call continuation handler "Vol::aggWriteDone", and
the MUTEX_LOCK in AIO.cc make vol->mutex->thread_holding is the current AIO
thread, which is a DEDICATED thread. If you enable cache evacuate, then
something error happens: CacheVC::evacuateDocDone will call do_read_call, which
call CacheVC::handleRead, and in CacheVC::handleRead, line "io.thread =
mutex->thread_holding" will use current AIO thread to do asynchronous io, but
AIO is a DEDICATED thread, it will neeeeever do this action, and the ink_assert
macro in EThread::schedule_imm_signal(in AIO.cc)will cause ats crash.()I think
the division of every thread's labor is not very clear, for example, the aio
thread is a DEDICATED thread, so if you wanna use it to call continuation
handler, which should be done by REGULAR thread, some bug will happen. Besides,
the epoll_wait should be done by a special thread like POLL thread, not ET_NET
thread, which is in actually a worker thread, isn't it?
> Thread error in the cache evacuation feature
> --------------------------------------------
>
> Key: TS-971
> URL: https://issues.apache.org/jira/browse/TS-971
> Project: Traffic Server
> Issue Type: Bug
> Reporter: mohan_zl
> Attachments: TS-evacuate-fix.patch
>
>
> After fix the Bug TS-970, i go on testing the evacuate feature for the cache,
> with the same environment and test methods, and this time, trafficserver
> crash in another codes, somewhat strange.
> {code}
> (gdb) bt
> #0 0x0000003639c30265 in raise () from /lib64/libc.so.6
> #1 0x0000003639c31d10 in abort () from /lib64/libc.so.6
> #2 0x00002b9258e7e6fa in ink_die_die_die (retval=Could not find the frame
> base for "ink_die_die_die".
> ) at ink_error.cc:43
> #3 0x00002b9258e7e979 in ink_fatal_va (return_code=Could not find the frame
> base for "ink_fatal_va".
> ) at ink_error.cc:65
> #4 0x00002b9258e7eb46 in ink_fatal (return_code=Could not find the frame
> base for "ink_fatal".
> ) at ink_error.cc:73
> #5 0x00002b9258e7c97a in _ink_assert (a=Could not find the frame base for
> "_ink_assert".
> ) at ink_assert.cc:44
> #6 0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010,
> e=0x2aaab4325e00, fast_signal=true)
> at ../../iocore/eventsystem/P_UnixEThread.h:96
> #7 0x00000000006496db in EThread::schedule_imm_signal (this=0x2aaaabe9c010,
> cont=0x302b948, callback_event=1, cookie=0x0)
> at ../../iocore/eventsystem/P_UnixEThread.h:62
> #8 0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528
> #9 0x00000000006c4afa in AIOThreadInfo::start (this=0x2aaaac0d1820, event=1,
> e=0x2a05650) at AIO.cc:188
> #10 0x00000000004d3789 in Continuation::handleEvent (this=0x2aaaac0d1820,
> event=1, data=0x2a05650) at I_Continuation.h:146
> #11 0x00000000006f705b in EThread::execute (this=0x2aaaabe9c010) at
> UnixEThread.cc:289
> #12 0x00000000006f6307 in spawn_thread_internal (a=0x2aaaac0d1870) at
> Thread.cc:88
> #13 0x000000363a8064a7 in start_thread () from /lib64/libpthread.so.0
> #14 0x0000003639cd3c2d in clone () from /lib64/libc.so.6
> (gdb) f 8
> #8 0x00000000006c4427 in aio_thread_main (arg=0x2aaaac0d1820) at AIO.cc:528
> 528 op->thread->schedule_imm_signal(op);
> (gdb) p *op
> $1 = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top =
> 0x760710},
> handler = 0x68a4f6 <AIOCallbackInternal::io_complete(int, void*)>,
> handler_name = 0x75dfb0 "&AIOCallbackInternal::io_complete", mutex = {
> m_ptr = 0x2aaaac261f70}, link = {<SLink<Continuation>> = {next = 0x0},
> prev = 0x0}}, aiocb = {aio_fildes = 38, aio_buf = 0x2aabd80db000,
> aio_nbytes = 3072, aio_offset = 3359854592, aio_reqprio = 0,
> aio_lio_opcode = 1, aio_state = 0, aio__pad = {0}}, action = {_vptr.Action =
> 0x0,
> continuation = 0x302b7c0, mutex = {m_ptr = 0x2aaaac261f70}, cancelled =
> 0}, thread = 0x2aaaabe9c010, then = 0x0, aio_result = 3072}
> (gdb) p ((CacheVC *)op->action->continuation)->io->aiocb
> $1 = {aio_fildes = 38, aio_buf = 0x2aabd80db000, aio_nbytes = 3072,
> aio_offset = 3359854592, aio_reqprio = 0, aio_lio_opcode = 1, aio_state = 0,
> aio__pad = {0}}
> (gdb) p ((CacheVC *)op->action->continuation)->handler_name
> $2 = 0x75f372 "&CacheVC::handleReadDone"
> (gdb) p ((CacheVC *)op->action->continuation)->f.evacuator
> $3 = 1
> (gdb) p ((CacheVC *)op->action->continuation)->save_handler
> $4 = 0x6afe9e <CacheVC::evacuateReadHead(int, Event*)>
> (gdb) f 6
> #6 0x00000000004f45df in EThread::schedule (this=0x2aaaabe9c010,
> e=0x2aaab4325e00, fast_signal=true)
> at ../../iocore/eventsystem/P_UnixEThread.h:96
> 96 ink_assert(tt == REGULAR);
> (gdb) p tt
> $2 = DEDICATED
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira