We've found part of the problem. In an attempt to fix a RangeServer shutdown hang problem, I made a change to the CommitLog destructor to close the commit log fragment asynchronously. This introduced a race condition since the CommitLog class is also used for the split log and under certain conditions, the log would be read by the receiving range server _before_ the asynchronous close operation completed. The split log would appear to be empty, even though it contained valid commits, which caused all of those commits to disappear.
Unfortunately, we've also noticed a similar symptom (though much more infrequently), in a prior commit. We've built tools to diagnose the problem and we're running tests this weekend. We should have some more results by Monday. My apologies to those of you who have been waiting on this release. We're on this 100% and are doing everything we can to get to the bottom of it as soon as possible. - Doug On Tue, Mar 15, 2011 at 10:10 AM, Doug Judd <[email protected]> wrote: > This is an update on the status of the 0.9.5.0 pre-release. This release > is aimed at fixing the stability problems that have been reported under > certain scenarios. It has involved some major code changes, including a > re-write of the client scanner logic and a complete overhaul of the master. > We're essentially "code complete" on the release, but a showstopper bug > (data loss) turned up in our final testing. We're in the process of > isolating the commit that introduced the bug, but it's slow going, taking > about 2-4 hours to reproduce. We'll keep you updated on our progress. > > - Doug > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
