Re: data recovery tool progress

Randall Leeds Tue, 10 Aug 2010 03:01:51 -0700

Filipe,
I'm not sure which changes you're talking about exactly, but I know
Adam and I decided to use the old gen_server:call({pread that can read
arbitrary positions as binaries. The reason for this is so that the
scanner can read large chunks in one call and then analyze that for
node terms.


On Tue, Aug 10, 2010 at 02:46, Filipe David Manana <[email protected]> wrote:
> Is it my impression or the forks I looked at (Volker, Adam, Randall) don't
> use the changes I made to couch_file? They were needed to try reading terms
> from random positions in the DB file, because if we try to read from a bad
> position, the couch_file gen_server crashed and was never restarted (it's
> not under a supervision tree).
>
>
> On Tue, Aug 10, 2010 at 10:28 AM, Filipe David Manana
> <[email protected]>wrote:
>
>>
>>
>> On Tue, Aug 10, 2010 at 9:55 AM, Robert Newson 
>> <[email protected]>wrote:
>>
>>> In ran the db_repair code on a healthy database produced with
>>> delayed_commits=true.
>>>
>>> The source db had 3218 docs. db_repair recovered 3120 and then returned
>>> with ok.
>>>
>>
>> When a DB is repaired, couch_db_repair:repair/1 returns something matching
>> {ok, repaired, _BTreeInfos}.
>> If it returns only the atom 'ok' it means it did nothing to the DB file.
>> At least in my original code, dunno if the forks changed that behaviour.
>>
>>
>>>
>>> I'm redoing that test, but this indicates we're not finding all roots.
>>>
>>> I note that the output file was 36 times the input file, which is a
>>> consequence of folding all possible roots. I think that needs to be in
>>> the release notes for the repair tool if that behavior remains when it
>>> ships.
>>>
>>> B.
>>>
>>> On Tue, Aug 10, 2010 at 9:09 AM, Mikeal Rogers <[email protected]>
>>> wrote:
>>> > I think I found a bug in the current lost+found repair.
>>> >
>>> > I've been running it against the testwritesdb and it's in a state that
>>> is
>>> > never finishing.
>>> >
>>> > It's still spitting out these lines:
>>> >
>>> > [info] [<0.32.0>] writing 1001 updates to lost+found/testwritesdb
>>> >
>>> > Most are 1001 but there are also other random variances 452, 866, etc.
>>> >
>>> > But the file size and dbinfo hasn't budged in over 30 minutes. The size
>>> is
>>> > stuck at 34300002 with the original db file being 54857478 .
>>> >
>>> > This database only has one document in it that isn't "lost" so if it's
>>> > finding *any* new docs it should be writing them.
>>> >
>>> > I also started another job to recover a production db that is quite
>>> large,
>>> > 500megs, with the missing data a week or so back. This has been running
>>> for
>>> > 2 hours and has still not output anything or created the lost and found
>>> db
>>> > so I can only assume that it is in the same state.
>>> >
>>> > Both machines are still churning 100% CPU.
>>> >
>>> > -Mikeal
>>> >
>>> >
>>> > On Mon, Aug 9, 2010 at 11:26 PM, Adam Kocoloski <[email protected]>
>>> wrote:
>>> >
>>> >> With Randall's help we hooked the new node scanner up to the lost+found
>>> DB
>>> >> generator.  It seems to work well enough for small DBs; for large DBs
>>> with
>>> >> lots of missing nodes the O(N^2) complexity of the problem catches up
>>> to the
>>> >> code and generating the lost+found DB takes quite some time.  Mikeal is
>>> >> running tests tonight.  The algo appears pretty CPU-limited, so a
>>> little
>>> >> parallelization may be warranted.
>>> >>
>>> >> http://github.com/kocolosk/couchdb/tree/db_repair
>>> >>
>>> >> Adam
>>> >>
>>> >> (I sent this previous update to myself instead of the list, so I'll
>>> forward
>>> >> it here ...)
>>> >>
>>> >> On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote:
>>> >>
>>> >> > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote:
>>> >> >
>>> >> >> Right, make_lost_and_found still relies on code which reads through
>>> >> couch_file one byte at a time, that's the cause of the slowness.  The
>>> newer
>>> >> scanner will improve that pretty dramatically, and we can tune it
>>> further by
>>> >> increasing the length of the pattern that we match when looking for
>>> >> kp/kv_node terms in the files, at the expense of some extra complexity
>>> >> dealing with the block prefixes (currently it does a 1-byte match,
>>> which as
>>> >> I understand it cannot be split across blocks).
>>> >> >
>>> >> > The scanner now looks for a 7 byte match, unless it is within 6 bytes
>>> of
>>> >> a block boundary, in which case it looks for the longest possible match
>>> at
>>> >> that position.  The more specific match condition greatly reduces the #
>>> of
>>> >> calls to couch_file, and thus boosts the throughput.  On my laptop it
>>> can
>>> >> scan the testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18
>>> >> seconds.
>>> >> >
>>> >> >> Regarding the file_corruption error on the larger file, I think this
>>> is
>>> >> something we will just naturally trigger when we take a guess that
>>> random
>>> >> positions in a file are actually the beginning of a term.  I think our
>>> best
>>> >> recourse here is to return {error, file_corruption} from couch_file but
>>> >> leave the gen_server up and running instead of terminating it.  That
>>> way the
>>> >> repair code can ignore the error and keep moving without having to
>>> reopen
>>> >> the file.
>>> >> >
>>> >> > I committed this change (to my db_repair branch) after consulting
>>> with
>>> >> Chris.  The longer match condition makes these spurious file_corruption
>>> >> triggers much less likely, but I think it's still a good thing not to
>>> crash
>>> >> the server when they happen.
>>> >> >
>>> >> >> Next steps as I understand them - Randall is working on integrating
>>> the
>>> >> in-memory scanner into Volker's code that finds all the dangling by_id
>>> >> nodes.  I'm working on making sure that the scanner identifies bt node
>>> >> candidates which span block prefixes, and on improving its
>>> pattern-matching.
>>> >> >
>>> >> > Latest from my end
>>> >> > http://github.com/kocolosk/couchdb/tree/db_repair
>>> >> >
>>> >> >>
>>> >> >> Adam
>>> >> >>
>>> >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote:
>>> >> >>
>>> >> >>> I pulled down the latest code from Adam's branch @
>>> >> >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
>>> >> >>>
>>> >> >>> Running timer:tc(couch_db_repair, make_lost_and_found,
>>> >> ["multi_conflict"]).
>>> >> >>> on a database with 200 lost updates spanning 200 restarts (
>>> >> >>>
>>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch )
>>> >> took
>>> >> >>> about 101 seconds.
>>> >> >>>
>>> >> >>> I tried running against a larger databases (
>>> >> >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch)
>>> >> and I
>>> >> >>> got this exception:
>>> >> >>>
>>> >> >>> http://gist.github.com/516491
>>> >> >>>
>>> >> >>> -Mikeal
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <
>>> [email protected]
>>> >> >wrote:
>>> >> >>>
>>> >> >>>> Summing up what went on in IRC for those who were absent.
>>> >> >>>>
>>> >> >>>> The latest progress is on Adam's branch at
>>> >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair
>>> >> >>>>
>>> >> >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new
>>> >> >>>> lost+found/DbName database to which it merges all nodes not
>>> accessible
>>> >> >>>> from anywhere (any other node found in a full file scan or any
>>> header
>>> >> >>>> pointers).
>>> >> >>>>
>>> >> >>>> Currently, make_lost_and_found uses Volker's repair (from
>>> >> >>>> couch_db_repair_b module, also in Adam's branch).
>>> >> >>>> Adam found that the bottleneck was couch_file calls and that the
>>> >> >>>> repair process was taking a very long time so he added
>>> >> >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as
>>> binary
>>> >> >>>> and tries to process it to find nodes instead of scanning back one
>>> >> >>>> byte at a time. It is currently not hooked up to the repair
>>> mechanism.
>>> >> >>>>
>>> >> >>>> Making progress. Go team.
>>> >> >>>>
>>> >> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <
>>> [email protected]>
>>> >> >>>> wrote:
>>> >> >>>>> jchris suggested on IRC that I try a normal doc update and see if
>>> >> that
>>> >> >>>> fixes
>>> >> >>>>> it.
>>> >> >>>>>
>>> >> >>>>> It does. After a new doc was created the dbinfo doc count was
>>> back to
>>> >> >>>>> normal.
>>> >> >>>>>
>>> >> >>>>> -Mikeal
>>> >> >>>>>
>>> >> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <
>>> >> [email protected]
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>>> Ok, I pulled down this code and tested against a database with a
>>> ton
>>> >> of
>>> >> >>>>>> missing writes right before a single restart.
>>> >> >>>>>>
>>> >> >>>>>> Before restart this was the database:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 124969
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 124969
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857478
>>> >> >>>>>> instance_start_time: "1281384140058211"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> After restart it was this:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 1
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 1
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857478
>>> >> >>>>>> instance_start_time: "1281384593876026"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> After repair, it's this:
>>> >> >>>>>>
>>> >> >>>>>> {
>>> >> >>>>>> db_name: "testwritesdb"
>>> >> >>>>>> doc_count: 1
>>> >> >>>>>> doc_del_count: 0
>>> >> >>>>>> update_seq: 124969
>>> >> >>>>>> purge_seq: 0
>>> >> >>>>>> compact_running: false
>>> >> >>>>>> disk_size: 54857820
>>> >> >>>>>> instance_start_time: "1281385990193289"
>>> >> >>>>>> disk_format_version: 5
>>> >> >>>>>> committed_update_seq: 124969
>>> >> >>>>>> }
>>> >> >>>>>>
>>> >> >>>>>> All the sequences are there and hitting _all_docs shows all the
>>> >> >>>> documents
>>> >> >>>>>> so why is the doc_count only 1 in the dbinfo?
>>> >> >>>>>>
>>> >> >>>>>> -Mikeal
>>> >> >>>>>>
>>> >> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana <
>>> >> >>>> [email protected]>wrote:
>>> >> >>>>>>
>>> >> >>>>>>> For the record (and people not on IRC), the code at:
>>> >> >>>>>>>
>>> >> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair
>>> >> >>>>>>>
>>> >> >>>>>>> is working for at least simple cases. Use
>>> >> >>>>>>> couch_db_repair:repair(DbNameAsString).
>>> >> >>>>>>> There's one TODO:  update the reduce values for the by_seq and
>>> >> by_id
>>> >> >>>>>>> BTrees.
>>> >> >>>>>>>
>>> >> >>>>>>> If anyone wants to give some help on this, your welcome.
>>> >> >>>>>>>
>>> >> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <
>>> >> [email protected]
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>>> I'm starting to create a bunch of test db files that expose
>>> this
>>> >> bug
>>> >> >>>>>>> under
>>> >> >>>>>>>> different conditions like multiple restarts, across
>>> compaction,
>>> >> >>>>>>> variances
>>> >> >>>>>>>> in
>>> >> >>>>>>>> updates the might cause conflict, etc.
>>> >> >>>>>>>>
>>> >> >>>>>>>> http://github.com/mikeal/couchtest
>>> >> >>>>>>>>
>>> >> >>>>>>>> The README outlines what was done to the db's and what needs
>>> to be
>>> >> >>>>>>>> recovered.
>>> >> >>>>>>>>
>>> >> >>>>>>>> -Mikeal
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana <
>>> >> >>>>>>> [email protected]
>>> >> >>>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson <
>>> >> >>>>>>> [email protected]
>>> >> >>>>>>>>>> wrote:
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>> Doesn't this bit;
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> -        Db#db{waiting_delayed_commit=nil};
>>> >> >>>>>>>>>> +        Db;
>>> >> >>>>>>>>>> +        % Db#db{waiting_delayed_commit=nil};
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> revert the bug fix?
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> That's intentional, for my local testing.
>>> >> >>>>>>>>> That patch isn't obviously anything close to final, it's too
>>> >> >>>>>>> experimental
>>> >> >>>>>>>>> yet.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> B.
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <
>>> [email protected]>
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>>>> Hi All,
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but
>>> he
>>> >> >>>>>>> isn't
>>> >> >>>>>>>>> done
>>> >> >>>>>>>>>> yet.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Here's the current patch:
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> it is not done and very early, but any help on this is
>>> greatly
>>> >> >>>>>>>>>> appreciated.
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> The current state is (in Filipe's words):
>>> >> >>>>>>>>>>> - i can detect that a file needs repair
>>> >> >>>>>>>>>>> - and get the last btree roots from it
>>> >> >>>>>>>>>>> - "only" missing: get last db seq num
>>> >> >>>>>>>>>>> - write new header
>>> >> >>>>>>>>>>> - and deal with the local docs btree (if exists)
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>> Thanks!
>>> >> >>>>>>>>>>> Jan
>>> >> >>>>>>>>>>> --
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>>
>>> >> >>>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> --
>>> >> >>>>>>>>> Filipe David Manana,
>>> >> >>>>>>>>> [email protected]
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> "Reasonable men adapt themselves to the world.
>>> >> >>>>>>>>> Unreasonable men adapt the world to themselves.
>>> >> >>>>>>>>> That's why all progress depends on unreasonable men."
>>> >> >>>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> Filipe David Manana,
>>> >> >>>>>>> [email protected]
>>> >> >>>>>>>
>>> >> >>>>>>> "Reasonable men adapt themselves to the world.
>>> >> >>>>>>> Unreasonable men adapt the world to themselves.
>>> >> >>>>>>> That's why all progress depends on unreasonable men."
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Filipe David Manana,
>> [email protected]
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>>
>>
>
>
> --
> Filipe David Manana,
> [email protected]
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>

Re: data recovery tool progress

Reply via email to