Hi Doug, I thinks it's better to open a new thread on this topic :)
The multiple maintenance thread crash is easy to reproduce: just set Hypertable.RangeServer.MaintenanceThreads=2, start all servers locally on a single node and run random_write_test 10000000000. The range server will crash in a minute. But the reason is sort of hard to track. What we know till now: 1. The bug is introduced in version 0.9.0.11. Former versions doesn't have this problem 2. According to RangeServer.log, the crash usually happens when two adjacent ranges are both splitting in two maintenance threads concurrently. If we forbid this behavior by modifying MaintenanceTaskQueue code, the crash problem is gone, but the reason is unknown. (Pheonix discovered this) 3. Sometimes the Range Server fails at HT_EXPECT (m_immutable_cache_ptr, Error::FAILED_EXPECTATION); in AccessGroup::run_compaction(). m_immutable_cache_ptr is set to 0 in multiple places with m_mutex locked, but not always checked in a locked environment, which is doubtable. Do you have any idea based on these facts? Donald --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
