I pulled down the latest code from Adam's branch @ 7080ff72baa329cf6c4be2a79e71a41f744ed93b.
Running timer:tc(couch_db_repair, make_lost_and_found, ["multi_conflict"]). on a database with 200 lost updates spanning 200 restarts ( http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch ) took about 101 seconds. I tried running against a larger databases ( http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) and I got this exception: http://gist.github.com/516491 -Mikeal On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <[email protected]>wrote: > Summing up what went on in IRC for those who were absent. > > The latest progress is on Adam's branch at > http://github.com/kocolosk/couchdb/tree/db_repair > > couch_db_repair:make_lost_and_found/1 attempts to create a new > lost+found/DbName database to which it merges all nodes not accessible > from anywhere (any other node found in a full file scan or any header > pointers). > > Currently, make_lost_and_found uses Volker's repair (from > couch_db_repair_b module, also in Adam's branch). > Adam found that the bottleneck was couch_file calls and that the > repair process was taking a very long time so he added > couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as binary > and tries to process it to find nodes instead of scanning back one > byte at a time. It is currently not hooked up to the repair mechanism. > > Making progress. Go team. > > On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <[email protected]> > wrote: > > jchris suggested on IRC that I try a normal doc update and see if that > fixes > > it. > > > > It does. After a new doc was created the dbinfo doc count was back to > > normal. > > > > -Mikeal > > > > On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <[email protected] > >wrote: > > > >> Ok, I pulled down this code and tested against a database with a ton of > >> missing writes right before a single restart. > >> > >> Before restart this was the database: > >> > >> { > >> db_name: "testwritesdb" > >> doc_count: 124969 > >> doc_del_count: 0 > >> update_seq: 124969 > >> purge_seq: 0 > >> compact_running: false > >> disk_size: 54857478 > >> instance_start_time: "1281384140058211" > >> disk_format_version: 5 > >> } > >> > >> After restart it was this: > >> > >> { > >> db_name: "testwritesdb" > >> doc_count: 1 > >> doc_del_count: 0 > >> update_seq: 1 > >> purge_seq: 0 > >> compact_running: false > >> disk_size: 54857478 > >> instance_start_time: "1281384593876026" > >> disk_format_version: 5 > >> } > >> > >> After repair, it's this: > >> > >> { > >> db_name: "testwritesdb" > >> doc_count: 1 > >> doc_del_count: 0 > >> update_seq: 124969 > >> purge_seq: 0 > >> compact_running: false > >> disk_size: 54857820 > >> instance_start_time: "1281385990193289" > >> disk_format_version: 5 > >> committed_update_seq: 124969 > >> } > >> > >> All the sequences are there and hitting _all_docs shows all the > documents > >> so why is the doc_count only 1 in the dbinfo? > >> > >> -Mikeal > >> > >> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < > [email protected]>wrote: > >> > >>> For the record (and people not on IRC), the code at: > >>> > >>> http://github.com/fdmanana/couchdb/commits/db_repair > >>> > >>> is working for at least simple cases. Use > >>> couch_db_repair:repair(DbNameAsString). > >>> There's one TODO: update the reduce values for the by_seq and by_id > >>> BTrees. > >>> > >>> If anyone wants to give some help on this, your welcome. > >>> > >>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <[email protected] > >>> >wrote: > >>> > >>> > I'm starting to create a bunch of test db files that expose this bug > >>> under > >>> > different conditions like multiple restarts, across compaction, > >>> variances > >>> > in > >>> > updates the might cause conflict, etc. > >>> > > >>> > http://github.com/mikeal/couchtest > >>> > > >>> > The README outlines what was done to the db's and what needs to be > >>> > recovered. > >>> > > >>> > -Mikeal > >>> > > >>> > On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < > >>> [email protected] > >>> > >wrote: > >>> > > >>> > > On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < > >>> [email protected] > >>> > > >wrote: > >>> > > > >>> > > > Doesn't this bit; > >>> > > > > >>> > > > - Db#db{waiting_delayed_commit=nil}; > >>> > > > + Db; > >>> > > > + % Db#db{waiting_delayed_commit=nil}; > >>> > > > > >>> > > > revert the bug fix? > >>> > > > > >>> > > > >>> > > That's intentional, for my local testing. > >>> > > That patch isn't obviously anything close to final, it's too > >>> experimental > >>> > > yet. > >>> > > > >>> > > > > >>> > > > B. > >>> > > > > >>> > > > On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <[email protected]> > >>> wrote: > >>> > > > > Hi All, > >>> > > > > > >>> > > > > Filipe jumped in to start working on the recovery tool, but he > >>> isn't > >>> > > done > >>> > > > yet. > >>> > > > > > >>> > > > > Here's the current patch: > >>> > > > > > >>> > > > > http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz > >>> > > > > > >>> > > > > it is not done and very early, but any help on this is greatly > >>> > > > appreciated. > >>> > > > > > >>> > > > > The current state is (in Filipe's words): > >>> > > > > - i can detect that a file needs repair > >>> > > > > - and get the last btree roots from it > >>> > > > > - "only" missing: get last db seq num > >>> > > > > - write new header > >>> > > > > - and deal with the local docs btree (if exists) > >>> > > > > > >>> > > > > Thanks! > >>> > > > > Jan > >>> > > > > -- > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > >>> > > -- > >>> > > Filipe David Manana, > >>> > > [email protected] > >>> > > > >>> > > "Reasonable men adapt themselves to the world. > >>> > > Unreasonable men adapt the world to themselves. > >>> > > That's why all progress depends on unreasonable men." > >>> > > > >>> > > >>> > >>> > >>> > >>> -- > >>> Filipe David Manana, > >>> [email protected] > >>> > >>> "Reasonable men adapt themselves to the world. > >>> Unreasonable men adapt the world to themselves. > >>> That's why all progress depends on unreasonable men." > >>> > >> > >> > > >
