Fix is available (it includes a regression test):

https://github.com/cruppstahl/hypertable/commit/614a7d5e34c254ffa77f8d29b456866e8d71bbec

Thanks
Christoph

2012/8/10 Christoph Rupp <[email protected]>

> ok, i can reproduce it... will work on a fix till next tuesday/wednesday.
>
> Thanks
> Christoph
>
>
> 2012/8/9 BigQiao <[email protected]>
>
>> This deadlock still exists in 0.9.6.0,   when delete a TableScanner
>>
>> a TableScanner destructor  lock IndexScannerCallback
>> then TableScannerAsync
>> a Database Working Thread lock TableScannerAsync then IndexScannerCallback
>>
>> Thread 14 (Thread 0x7fffee266700 (LWP 10936)):      //Database Working
>> Thread
>> #0  __lll_lock_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
>> #1  0x00007ffff79c8179 in _L_lock_953 () from /lib/libpthread.so.0
>> #2  0x00007ffff79c7f9b in __pthread_mutex_lock (mutex=0xc08630) at
>> pthread_mutex_lock.c:61
>> #3  0x0000000000477886 in boost::mutex::lock (this=0xc08630) at
>> /usr/include/boost/thread/pthread/mutex.hpp:50
>> #4  0x000000000047e790 in boost::unique_lock<boost::mutex>::lock
>> (this=0x7fffee2638e0) at /usr/include/boost/thread/locks.hpp:349
>> #5  0x000000000047d51d in unique_lock (this=0x7fffee2638e0, m_=...) at
>> /usr/include/boost/thread/locks.hpp:227
>> #6  0x00000000005f0a07 in
>> Hypertable::IndexScannerCallback::scan_ok(Hypertable::TableScannerAsync*,
>> boost::intrusive_ptr<Hypertable::ScanCells>&) ()
>> #7  0x00000000005ed180 in
>> Hypertable::TableScannerAsync::maybe_callback_ok (this=0x10e7b50,
>> scanner_id=1, next=true, do_callback=true, cells=...)
>>     at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:522
>> #8  0x00000000005ec5cc in Hypertable::TableScannerAsync::handle_result
>> (this=0x10e7b50, scanner_id=1, event=..., is_create=true)
>>     at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:459
>> #9  0x00000000006286d2 in Hypertable::TableScannerHandler::run
>> (this=0x7fffe8049e30) at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerHandler.cc:40
>> #10 0x000000000047b625 in
>> Hypertable::ApplicationQueue::Worker::operator()() ()
>> #11 0x000000000048dbd2 in
>> boost::detail::thread_data<Hypertable::ApplicationQueue::Worker>::run() ()
>> #12 0x00007ffff77b5200 in thread_proxy () from
>> /usr/lib/libboost_thread.so.1.42.0
>> #13 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at
>> pthread_create.c:300
>> #14 0x00007ffff4978b6d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #15 0x0000000000000000 in ?? ()
>>
>>
>> Thread 27 (Thread 0x7fffe33ee700 (LWP 10949)):         //TableScanner
>> Destructor Thread
>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>> #1  0x000000000047d87f in
>> boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> >
>> (this=0x10e7c18, m=...)
>>     at /usr/include/boost/thread/pthread/condition_variable.hpp:84
>> #2  0x00000000005ed224 in
>> Hypertable::TableScannerAsync::wait_for_completion (this=0x10e7b50)
>>     at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:535
>> #3  0x00000000005eb370 in ~TableScannerAsync (this=0x10e7b50,
>> __in_chrg=<value optimized out>)
>>     at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:318
>> #4  0x00000000005f01d3 in
>> Hypertable::IndexScannerCallback::~IndexScannerCallback() ()
>> #5  0x00000000005eb579 in ~TableScannerAsync (this=0xc04ca0,
>> __in_chrg=<value optimized out>)
>>     at
>> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:324
>> #6  0x0000000000444847 in Hypertable::intrusive_ptr_release (rc=0xc04ca0)
>> at /opt/hypertable/0.9.6.0/include/Common/ReferenceCount.h:73
>> #7  0x00000000005e70e3 in
>> boost::intrusive_ptr<Hypertable::TableScannerAsync>::~intrusive_ptr() ()
>> #8  0x00000000005e6cf3 in Hypertable::TableScanner::~TableScanner() ()
>> #9  0x000000000043c943 in DBRecycled::run (this=0xa95c60) at
>> /home/qiao/Project/Bingo/DistributedSpider/DBRecycled.cpp:48
>> #10 0x000000000046eed7 in thread_proc (param=0x7fffe805ae00) at
>> /home/qiao/Project/Bingo/DistributedSpider/shared/Threading/ThreadPool.cpp:331
>> #11 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at
>> pthread_create.c:300
>> #12 0x00007ffff4978b6d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #13 0x0000000000000000 in ?? ()
>>
>>
>> Sorry for the delay - i was finally able to reproduce it and i also fixed
>>> it.
>>>
>>> The commit is a bit larger than my first try.
>>>
>>> https://github.com/cruppstahl/**hypertable/commits/v0.9.5<https://github.com/cruppstahl/hypertable/commits/v0.9.5>
>>>
>>> commit b45ba15b701373c3a1f689f8997f31**bde8ff5165
>>> Author: Christoph Rupp <[email protected]>
>>> Date:   Wed Apr 25 18:54:11 2012 +0200
>>>
>>>     issue 827: fixed deadlock when scanning secondary indices
>>>
>>> Thanks again for your great help!
>>>
>>> Best regards
>>> Christoph
>>>
>>> 2012/4/26 gcc.lua <[email protected]>
>>>
>>>> Hi,
>>>>
>>>> thanks to reply quickly, but the commit just remove m_mutex inside
>>>> virtual ~IndexScannerCallback() ,
>>>>  I try it,  will  a new problem occured, see end of report,
>>>> some additional info about  reproduce this issue before you commit
>>>>
>>>> void run()
>>>> {
>>>> TableScannerPtr aScanner = tbSourcelist-
>>>> >create_scanner( specbuilder.get(), 5000 );
>>>>
>>>>  while( aScanner->next( gotCell ) )
>>>>  {
>>>>   ....
>>>>        if(condition)
>>>>           break;//if have next result, now break, internel scanner
>>>> thread running
>>>>   ....
>>>>  }
>>>>  return;//trigger  TableScanner destructor,  next info see my first
>>>> post please
>>>> }
>>>>
>>>> //////////////////////////////**//////////////////////////////**
>>>> //////////////////////////
>>>>
>>>>
>>>> pure virtual method called
>>>> terminate called without an active exception
>>>>
>>>> Program received signal SIGABRT, Aborted.
>>>> [Switching to Thread 0x7fffe6ff5700 (LWP 23887)]
>>>> 0x00007ffff48db1b5 in raise () from /lib/libc.so.6
>>>>
>>>>
>>>> (gdb) where
>>>> #0  0x00007ffff48db1b5 in raise () from /lib/libc.so.6
>>>> #1  0x00007ffff48ddfc0 in abort () from /lib/libc.so.6
>>>> #2  0x00007ffff516fdc5 in __gnu_cxx::__verbose_**terminate_handler() ()
>>>> from /usr/lib/libstdc++.so.6
>>>> #3  0x00007ffff516e166 in ?? () from /usr/lib/libstdc++.so.6
>>>> #4  0x00007ffff516e193 in std::terminate() () from /usr/lib/libstdc+
>>>> +.so.6
>>>> #5  0x00007ffff516ea6f in __cxa_pure_virtual () from /usr/lib/libstdc+
>>>> +.so.6
>>>> #6  0x00000000005c43c6 in
>>>> Hypertable::TableScannerAsync:**:maybe_callback_ok
>>>> (this=0x7fffb432ecd0,
>>>> scanner_id=19373, next=true, do_callback=true, cells=...)
>>>>     at
>>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>>> TableScannerAsync.cc:520
>>>> #7  0x00000000005c393f in
>>>> Hypertable::TableScannerAsync:**:handle_result
>>>> (this=0x7fffb432ecd0, scanner_id=19373, event=..., is_create=true)
>>>>     at
>>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>>> TableScannerAsync.cc:464
>>>> #8  0x00000000005fdc5e in Hypertable::**TableScannerHandler::run
>>>> (this=0x7fff99915850) at
>>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>>> TableScannerHandler.cc:40
>>>> #9  0x000000000045f2c5 in
>>>> Hypertable::ApplicationQueue::**Worker::operator() (this=0xaaa120) at
>>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/AsyncComm/
>>>> ApplicationQueue.h:173
>>>> #10 0x0000000000470f04 in
>>>> boost::detail::thread_data<**Hypertable::ApplicationQueue::**
>>>> Worker>::run
>>>> (this=0xaa9ff0) at /usr/include/boost/thread/**detail/thread.hpp:56
>>>> #11 0x00007ffff77b5200 in thread_proxy () from
>>>> /usr/lib/libboost_thread.so.1.**42.0
>>>> #12 0x00007ffff79c58ca in start_thread () from /lib/libpthread.so.0
>>>> #13 0x00007ffff497892d in clone () from /lib/libc.so.6
>>>> #14 0x0000000000000000 in ?? ()
>>>>
>>>> On 4月26日, 上午12时56分, Christoph Rupp <[email protected]> wrote:
>>>> > Hi,
>>>> >
>>>> > thanks for the great bug report.
>>>> >
>>>> > I am not able to reproduce this issue, but i think i came up with a
>>>> fix. If
>>>> > you want to check out the sources then you can get them here:
>>>> https://github.com/**cruppstahl/hypertablebranch<https://github.com/cruppstahl/hypertablebranch>"v0.9.5"
>>>> >
>>>> > This is the commit:
>>>> > commit 2572b5dcb524e1c36dc23307c37784**fd34c1bdde
>>>> > Author: Christoph Rupp <[email protected]>
>>>> > Date:   Wed Apr 25 18:54:11 2012 +0200
>>>> >
>>>> >     issue 827: fixed deadlock when scanning secondary indices
>>>> >
>>>> > And here's the diff:
>>>> >
>>>> > diff --git a/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>>> > b/src/cc/Hypertable/Li
>>>> > index 70ffda7..1b37127 100644
>>>> > --- a/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>>> > +++ b/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>>> > @@ -118,13 +118,12 @@ static String last;
>>>> >      }
>>>> >
>>>> >      virtual ~IndexScannerCallback() {
>>>> > -      ScopedLock lock(m_mutex);
>>>> > -      if (m_mutator)
>>>> > -        delete m_mutator;
>>>> >        foreach (TableScannerAsync *s, m_scanners)
>>>> >          delete s;
>>>> >        m_scanners.clear();
>>>> >        sspecs_clear();
>>>> > +      if (m_mutator)
>>>> > +        delete m_mutator;
>>>> >
>>>> > Can you please give it a try and see if this helps?
>>>> >
>>>> > Thanks
>>>> > Christoph
>>>> >
>>>> > 2012/4/24 gcc.lua <[email protected]>
>>>> >
>>>> > > user thread  logic like follow:
>>>> > > TableScannerPtr aScanner = tbSourcelist-
>>>> > > >create_scanner( specbuilder.get(), 5000 );
>>>> > >  while( aScanner->next( gotCell ) )
>>>> > >  {
>>>> > >         .....
>>>> > >  }
>>>> >
>>>> > > dead lock between user thread and scanner thread:
>>>> >
>>>> > > 1. user thread TableScanner
>>>> >
>>>> > >    TableScannerAsync::~**TableScannerAsync() {
>>>> > >  try {
>>>> > >    cancel();
>>>> > >    wait_for_completion();
>>>> > >  }
>>>> > >  catch (Exception &e) {
>>>> > >    HT_ERROR_OUT << e << HT_END;
>>>> > >  }
>>>> > >  if (m_use_index) {
>>>> > >    delete m_cb;//<======================**===dead lock entry
>>>> > >    m_cb = 0;
>>>> > >  }
>>>> > > }
>>>> > > //////////////////////////////**///////////
>>>> > >   virtual ~IndexScannerCallback() {
>>>> > >  ScopedLock lock(m_mutex);//<=========  user thread got this
>>>> > > IndexScannerCallback::m_mutex
>>>> > >      if (m_mutator)
>>>> > >        delete m_mutator;
>>>> >
>>>> > >      foreach (TableScannerAsync *s, m_scanners)
>>>> > >        delete s;//dead lock 1<=============user thread wait
>>>> > > TableScannerAsync::m_mutex
>>>> >
>>>> > > 2. scanner thread
>>>> >
>>>> > >  void TableScannerAsync::handle_**result(int scanner_id, EventPtr
>>>> > > &event, bool is_create) {
>>>> >
>>>> > >  bool cancelled = is_cancelled();
>>>> > >  ScopedLock lock(m_mutex);<============**scanner thread got
>>>> > > TableScannerAsync::m_mutex
>>>> > >  ScanCellsPtr cells;
>>>> >
>>>> > >    . . . . . .
>>>> > >  maybe_callback_ok();<========**========call  m_cb->scan_ok(this,
>>>> > > cells);
>>>> >
>>>> > > }
>>>> > > //////////////////////////////
>>>> > >  class IndexScannerCallback : public ResultCallback {
>>>> >
>>>> > >    virtual void scan_ok(TableScannerAsync *scanner, ScanCellsPtr
>>>> > > &scancells) {
>>>> > >      bool is_eos = scancells->get_eos();
>>>> > >      String table_name = scanner->get_table_name();
>>>> >
>>>> > >      ScopedLock lock(m_mutex);//dead lock 2<============scanner
>>>> > > thread wait IndexScannerCallback::m_mutex
>>>> >
>>>> > > --
>>>> > > You received this message because you are subscribed to the Google
>>>> Groups
>>>> > > "Hypertable Development" group.
>>>> > > To post to this group, send email to hyperta...@googlegroups.**com.
>>>> > > To unsubscribe from this group, send email to
>>>> > > hypertable-de...@**googlegroups.com.
>>>> > > For more options, visit this group at
>>>> > >http://groups.google.com/**group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en>
>>>> .
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Hypertable Development" group.
>>>> To post to this group, send email to hyperta...@googlegroups.**com.
>>>> To unsubscribe from this group, send email to hypertable-de...@**
>>>> googlegroups.com.
>>>> For more options, visit this group at http://groups.google.com/**
>>>> group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en>
>>>> .
>>>>
>>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Hypertable Development" group.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/hypertable-dev?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to