Hi Doug,

I've been running another stress test on a single machine for 0.9.2.3
release and it seems that the deadlock problem resurfaced. The patch
I've applied did fix things, but it also moved the problem to a
different place in the code now. Last time the deadlock occurred, it
was locking up the RangeServer and the error messages were present in
the logs for RangeServer. This time the problem appears on the
application side:

1238339511 WARN mergesort_splits :
(/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorDispatchHandler.cc:85)
Event: type=ERROR "HYPERTABLE request timeout" from=172.16.0.19:38060,
will retry ...
1238339511 WARN mergesort_splits :
(/home/mateusz/hypertable/src/cc/AsyncComm/IOHandlerData.cc:348)
Received response for non-pending event
(id=672,version=1,total_len=40)
1238339550 ERROR mergesort_splits : handle_exceptions
(/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:53):
Hypertable::Exception: auto flushing - HYPERTABLE request timeout
        at void
Hypertable::TableMutator::auto_flush(Hypertable::Timer&)
(/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:215)
        at void
Hypertable::TableMutator::wait_for_previous_buffer(Hypertable::Timer&)
(/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:303):
waiting for previous buffer
        at bool
Hypertable::TableMutatorCompletionCounter::wait_for_completion(Hypertable::Timer&)
(/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorCompletionCounter.h:71):
terminate called after throwing an instance of 'Hypertable::Exception'
  what():  auto flushing
Aborted

I have found this deadlock in 2 scenarios outlined below:
1. when running a long lasting insertion process I tried selecting in
CLI from the same table I was inserting to. this resulted in a
deadlock, but it might have been just a coincidence as the deadlock
might have been already triggered

2. runnning a long lasting insertion process. The application
basically does external mergesort on approx 40 gb of data and inserts
the data in the sorted order to the index table. this triggers the
error I pasted verbatim above.

Shall I be grabbing the stack trace at the time of deadlock occurring
from the application, rangeserver or both ? Which one should be more
helpful in investigating the bug?

Mateusz

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to