Hi Mateusz, I assume that you're seeing CPU utilization drop to zero followed by a delay and then this timeout. Correct? If that's the case, if you could get a stack trace from the Hypertable.RangeServer and the Hypertable.Master processes at the time of deadlock, that would be most helpful in chasing this one down. Thanks!
- Doug On Sun, Mar 29, 2009 at 8:21 AM, Mateusz Berezecki <[email protected]>wrote: > > Hi Doug, > > I've been running another stress test on a single machine for 0.9.2.3 > release and it seems that the deadlock problem resurfaced. The patch > I've applied did fix things, but it also moved the problem to a > different place in the code now. Last time the deadlock occurred, it > was locking up the RangeServer and the error messages were present in > the logs for RangeServer. This time the problem appears on the > application side: > > 1238339511 WARN mergesort_splits : > > (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorDispatchHandler.cc:85) > Event: type=ERROR "HYPERTABLE request timeout" from=172.16.0.19:38060, > will retry ... > 1238339511 WARN mergesort_splits : > (/home/mateusz/hypertable/src/cc/AsyncComm/IOHandlerData.cc:348) > Received response for non-pending event > (id=672,version=1,total_len=40) > 1238339550 ERROR mergesort_splits : handle_exceptions > (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:53): > Hypertable::Exception: auto flushing - HYPERTABLE request timeout > at void > Hypertable::TableMutator::auto_flush(Hypertable::Timer&) > (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:215) > at void > Hypertable::TableMutator::wait_for_previous_buffer(Hypertable::Timer&) > (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:303): > waiting for previous buffer > at bool > > Hypertable::TableMutatorCompletionCounter::wait_for_completion(Hypertable::Timer&) > > (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorCompletionCounter.h:71): > terminate called after throwing an instance of 'Hypertable::Exception' > what(): auto flushing > Aborted > > I have found this deadlock in 2 scenarios outlined below: > 1. when running a long lasting insertion process I tried selecting in > CLI from the same table I was inserting to. this resulted in a > deadlock, but it might have been just a coincidence as the deadlock > might have been already triggered > > 2. runnning a long lasting insertion process. The application > basically does external mergesort on approx 40 gb of data and inserts > the data in the sorted order to the index table. this triggers the > error I pasted verbatim above. > > Shall I be grabbing the stack trace at the time of deadlock occurring > from the application, rangeserver or both ? Which one should be more > helpful in investigating the bug? > > Mateusz > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
