Re: data recovery tool progress

Jan Lehnardt Tue, 10 Aug 2010 03:27:52 -0700

On 10 Aug 2010, at 10:55, Robert Newson wrote:

> In ran the db_repair code on a healthy database produced with
> delayed_commits=true.
> 
> The source db had 3218 docs. db_repair recovered 3120 and then returned with 
> ok.


This looks like we are recovering nodes that don't need recovering because on a 
healthy db produced with delayed_commits=true we should not have any orphans at 
all, so the lost and found db should be empty.


> 
> I'm redoing that test, but this indicates we're not finding all roots.
> 
> I note that the output file was 36 times the input file, which is a
> consequence of folding all possible roots. I think that needs to be in
> the release notes for the repair tool if that behavior remains when it
> ships.
> 
> B.
> 
> On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers <[email protected]> 
> wrote:
>> I think I found a bug in the current lost+found repair.
>> 
>> I've been running it against the testwritesdb and it's in a state that is
>> never finishing.
>> 
>> It's still spitting out these lines:
>> 
>> [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb
>> 
>> Most are 1001 but there are also other random variances 452, 866, etc.
>> 
>> But the file size and dbinfo hasn't budged in over 30 minutes. The size is
>> stuck at 34300002 with the original db file being 54857478 .
>> 
>> This database only has one document in it that isn't "lost" so if it's
>> finding *any* new docs it should be writing them.
>> 
>> I also started another job to recover a production db that is quite large,
>> 500megs, with the missing data a week or so back. This has been running for
>> 2 hours and has still not output anything or created the lost and found db
>> so I can only assume that it is in the same state.
>> 
>> Both machines are still churning 100% CPU.
>> 
>> -Mikeal
>> 
>> 
>> On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski <[email protected]> wrote:
>> 
>>> With Randall's help we hooked the new node scanner up to the lost+found DB
>>> generator.  It seems to work well enough for small DBs; for large DBs with
>>> lots of missing nodes the O(N^2) complexity of the problem catches up to the
>>> code and generating the lost+found DB takes quite some time.  Mikeal is
>>> running tests tonight.  The algo appears pretty CPU-limited, so a little
>>> parallelization may be warranted.
>>> 
>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>> 
>>> Adam
>>> 
>>> (I sent this previous update to myself instead of the list, so I'll forward
>>> it here ...)
>>> 
>>> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:
>>> 
>>>> On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
>>>> 
>>>>> Right, make_lost_and_found still relies on code which reads through
>>> couch_file one byte at a time, that's the cause of the slowness.  The newer
>>> scanner will improve that pretty dramatically, and we can tune it further by
>>> increasing the length of the pattern that we match when looking for
>>> kp/kv_node terms in the files, at the expense of some extra complexity
>>> dealing with the block prefixes (currently it does a 1-byte match, which as
>>> I understand it cannot be split across blocks).
>>>> 
>>>> The scanner now looks for a 7 byte match, unless it is within 6 bytes of
>>> a block boundary, in which case it looks for the longest possible match at
>>> that position.  The more specific match condition greatly reduces the # of
>>> calls to couch_file, and thus boosts the throughput.  On my laptop it can
>>> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18
>>> seconds.
>>>> 
>>>>> Regarding the file_corruption error on the larger file, I think this is
>>> something we will just naturally trigger when we take a guess that random
>>> positions in a file are actually the beginning of a term.  I think our best
>>> recourse here is to return {error, file_corruption} from couch_file but
>>> leave the gen_server up and running instead of terminating it.  That way the
>>> repair code can ignore the error and keep moving without having to reopen
>>> the file.
>>>> 
>>>> I committed this change (to my db_repair branch) after consulting with
>>> Chris.  The longer match condition makes these spurious file_corruption
>>> triggers much less likely, but I think it's still a good thing not to crash
>>> the server when they happen.
>>>> 
>>>>> Next steps as I understand them - Randall is working on integrating the
>>> in-memory scanner into Volker's code that finds all the dangling by_id
>>> nodes.  I'm working on making sure that the scanner identifies bt node
>>> candidates which span block prefixes, and on improving its pattern-matching.
>>>> 
>>>> Latest from my end
>>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>>> 
>>>>> 
>>>>> Adam
>>>>> 
>>>>> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
>>>>> 
>>>>>> I pulled down the latest code from Adam's branch @
>>>>>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
>>>>>> 
>>>>>> Running timer:tc(couch_db_repair, make_lost_and_found,
>>> ["multi_conflict"]).
>>>>>> on a database with 200 lost updates spanning 200 restarts (
>>>>>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch )
>>> took
>>>>>> about 101 seconds.
>>>>>> 
>>>>>> I tried running against a larger databases (
>>>>>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch )
>>> and I
>>>>>> got this exception:
>>>>>> 
>>>>>> http://gist.github.com/516491
>>>>>> 
>>>>>> -Mikeal
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <[email protected]
>>>> wrote:
>>>>>> 
>>>>>>> Summing up what went on in IRC for those who were absent.
>>>>>>> 
>>>>>>> The latest progress is on Adam's branch at
>>>>>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>>>>>> 
>>>>>>> couch_db_repair:make_lost_and_found/1 attempts to create a new
>>>>>>> lost+found/DbName database to which it merges all nodes not accessible
>>>>>>> from anywhere (any other node found in a full file scan or any header
>>>>>>> pointers).
>>>>>>> 
>>>>>>> Currently, make_lost_and_found uses Volker's repair (from
>>>>>>> couch_db_repair_b module, also in Adam's branch).
>>>>>>> Adam found that the bottleneck was couch_file calls and that the
>>>>>>> repair process was taking a very long time so he added
>>>>>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as binary
>>>>>>> and tries to process it to find nodes instead of scanning back one
>>>>>>> byte at a time. It is currently not hooked up to the repair mechanism.
>>>>>>> 
>>>>>>> Making progress. Go team.
>>>>>>> 
>>>>>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <[email protected]>
>>>>>>> wrote:
>>>>>>>> jchris suggested on IRC that I try a normal doc update and see if
>>> that
>>>>>>> fixes
>>>>>>>> it.
>>>>>>>> 
>>>>>>>> It does. After a new doc was created the dbinfo doc count was back to
>>>>>>>> normal.
>>>>>>>> 
>>>>>>>> -Mikeal
>>>>>>>> 
>>>>>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <
>>> [email protected]
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Ok, I pulled down this code and tested against a database with a ton
>>> of
>>>>>>>>> missing writes right before a single restart.
>>>>>>>>> 
>>>>>>>>> Before restart this was the database:
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> db_name: "testwritesdb"
>>>>>>>>> doc_count: 124969
>>>>>>>>> doc_del_count: 0
>>>>>>>>> update_seq: 124969
>>>>>>>>> purge_seq: 0
>>>>>>>>> compact_running: false
>>>>>>>>> disk_size: 54857478
>>>>>>>>> instance_start_time: "1281384140058211"
>>>>>>>>> disk_format_version: 5
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> After restart it was this:
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> db_name: "testwritesdb"
>>>>>>>>> doc_count: 1
>>>>>>>>> doc_del_count: 0
>>>>>>>>> update_seq: 1
>>>>>>>>> purge_seq: 0
>>>>>>>>> compact_running: false
>>>>>>>>> disk_size: 54857478
>>>>>>>>> instance_start_time: "1281384593876026"
>>>>>>>>> disk_format_version: 5
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> After repair, it's this:
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> db_name: "testwritesdb"
>>>>>>>>> doc_count: 1
>>>>>>>>> doc_del_count: 0
>>>>>>>>> update_seq: 124969
>>>>>>>>> purge_seq: 0
>>>>>>>>> compact_running: false
>>>>>>>>> disk_size: 54857820
>>>>>>>>> instance_start_time: "1281385990193289"
>>>>>>>>> disk_format_version: 5
>>>>>>>>> committed_update_seq: 124969
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> All the sequences are there and hitting _all_docs shows all the
>>>>>>> documents
>>>>>>>>> so why is the doc_count only 1 in the dbinfo?
>>>>>>>>> 
>>>>>>>>> -Mikeal
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana <
>>>>>>> [email protected]>wrote:
>>>>>>>>> 
>>>>>>>>>> For the record (and people not on IRC), the code at:
>>>>>>>>>> 
>>>>>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
>>>>>>>>>> 
>>>>>>>>>> is working for at least simple cases. Use
>>>>>>>>>> couch_db_repair:repair(DbNameAsString).
>>>>>>>>>> There's one TODO:  update the reduce values for the by_seq and
>>> by_id
>>>>>>>>>> BTrees.
>>>>>>>>>> 
>>>>>>>>>> If anyone wants to give some help on this, your welcome.
>>>>>>>>>> 
>>>>>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <
>>> [email protected]
>>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I'm starting to create a bunch of test db files that expose this
>>> bug
>>>>>>>>>> under
>>>>>>>>>>> different conditions like multiple restarts, across compaction,
>>>>>>>>>> variances
>>>>>>>>>>> in
>>>>>>>>>>> updates the might cause conflict, etc.
>>>>>>>>>>> 
>>>>>>>>>>> http://github.com/mikeal/couchtest
>>>>>>>>>>> 
>>>>>>>>>>> The README outlines what was done to the db's and what needs to be
>>>>>>>>>>> recovered.
>>>>>>>>>>> 
>>>>>>>>>>> -Mikeal
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson <
>>>>>>>>>> [email protected]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Doesn't this bit;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -        Db#db{waiting_delayed_commit=nil};
>>>>>>>>>>>>> +        Db;
>>>>>>>>>>>>> +        % Db#db{waiting_delayed_commit=nil};
>>>>>>>>>>>>> 
>>>>>>>>>>>>> revert the bug fix?
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> That's intentional, for my local testing.
>>>>>>>>>>>> That patch isn't obviously anything close to final, it's too
>>>>>>>>>> experimental
>>>>>>>>>>>> yet.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> B.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but he
>>>>>>>>>> isn't
>>>>>>>>>>>> done
>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here's the current patch:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> it is not done and very early, but any help on this is greatly
>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The current state is (in Filipe's words):
>>>>>>>>>>>>>> - i can detect that a file needs repair
>>>>>>>>>>>>>> - and get the last btree roots from it
>>>>>>>>>>>>>> - "only" missing: get last db seq num
>>>>>>>>>>>>>> - write new header
>>>>>>>>>>>>>> - and deal with the local docs btree (if exists)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Filipe David Manana,
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> 
>>>>>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Filipe David Manana,
>>>>>>>>>> [email protected]
>>>>>>>>>> 
>>>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>

Re: data recovery tool progress

Reply via email to