Summing up what went on in IRC for those who were absent. The latest progress is on Adam's branch at http://github.com/kocolosk/couchdb/tree/db_repair
couch_db_repair:make_lost_and_found/1 attempts to create a new lost+found/DbName database to which it merges all nodes not accessible from anywhere (any other node found in a full file scan or any header pointers). Currently, make_lost_and_found uses Volker's repair (from couch_db_repair_b module, also in Adam's branch). Adam found that the bottleneck was couch_file calls and that the repair process was taking a very long time so he added couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as binary and tries to process it to find nodes instead of scanning back one byte at a time. It is currently not hooked up to the repair mechanism. Making progress. Go team. On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <[email protected]> wrote: > jchris suggested on IRC that I try a normal doc update and see if that fixes > it. > > It does. After a new doc was created the dbinfo doc count was back to > normal. > > -Mikeal > > On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <[email protected]>wrote: > >> Ok, I pulled down this code and tested against a database with a ton of >> missing writes right before a single restart. >> >> Before restart this was the database: >> >> { >> db_name: "testwritesdb" >> doc_count: 124969 >> doc_del_count: 0 >> update_seq: 124969 >> purge_seq: 0 >> compact_running: false >> disk_size: 54857478 >> instance_start_time: "1281384140058211" >> disk_format_version: 5 >> } >> >> After restart it was this: >> >> { >> db_name: "testwritesdb" >> doc_count: 1 >> doc_del_count: 0 >> update_seq: 1 >> purge_seq: 0 >> compact_running: false >> disk_size: 54857478 >> instance_start_time: "1281384593876026" >> disk_format_version: 5 >> } >> >> After repair, it's this: >> >> { >> db_name: "testwritesdb" >> doc_count: 1 >> doc_del_count: 0 >> update_seq: 124969 >> purge_seq: 0 >> compact_running: false >> disk_size: 54857820 >> instance_start_time: "1281385990193289" >> disk_format_version: 5 >> committed_update_seq: 124969 >> } >> >> All the sequences are there and hitting _all_docs shows all the documents >> so why is the doc_count only 1 in the dbinfo? >> >> -Mikeal >> >> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana >> <[email protected]>wrote: >> >>> For the record (and people not on IRC), the code at: >>> >>> http://github.com/fdmanana/couchdb/commits/db_repair >>> >>> is working for at least simple cases. Use >>> couch_db_repair:repair(DbNameAsString). >>> There's one TODO: update the reduce values for the by_seq and by_id >>> BTrees. >>> >>> If anyone wants to give some help on this, your welcome. >>> >>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <[email protected] >>> >wrote: >>> >>> > I'm starting to create a bunch of test db files that expose this bug >>> under >>> > different conditions like multiple restarts, across compaction, >>> variances >>> > in >>> > updates the might cause conflict, etc. >>> > >>> > http://github.com/mikeal/couchtest >>> > >>> > The README outlines what was done to the db's and what needs to be >>> > recovered. >>> > >>> > -Mikeal >>> > >>> > On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < >>> [email protected] >>> > >wrote: >>> > >>> > > On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < >>> [email protected] >>> > > >wrote: >>> > > >>> > > > Doesn't this bit; >>> > > > >>> > > > - Db#db{waiting_delayed_commit=nil}; >>> > > > + Db; >>> > > > + % Db#db{waiting_delayed_commit=nil}; >>> > > > >>> > > > revert the bug fix? >>> > > > >>> > > >>> > > That's intentional, for my local testing. >>> > > That patch isn't obviously anything close to final, it's too >>> experimental >>> > > yet. >>> > > >>> > > > >>> > > > B. >>> > > > >>> > > > On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <[email protected]> >>> wrote: >>> > > > > Hi All, >>> > > > > >>> > > > > Filipe jumped in to start working on the recovery tool, but he >>> isn't >>> > > done >>> > > > yet. >>> > > > > >>> > > > > Here's the current patch: >>> > > > > >>> > > > > http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz >>> > > > > >>> > > > > it is not done and very early, but any help on this is greatly >>> > > > appreciated. >>> > > > > >>> > > > > The current state is (in Filipe's words): >>> > > > > - i can detect that a file needs repair >>> > > > > - and get the last btree roots from it >>> > > > > - "only" missing: get last db seq num >>> > > > > - write new header >>> > > > > - and deal with the local docs btree (if exists) >>> > > > > >>> > > > > Thanks! >>> > > > > Jan >>> > > > > -- >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > > >>> > > -- >>> > > Filipe David Manana, >>> > > [email protected] >>> > > >>> > > "Reasonable men adapt themselves to the world. >>> > > Unreasonable men adapt the world to themselves. >>> > > That's why all progress depends on unreasonable men." >>> > > >>> > >>> >>> >>> >>> -- >>> Filipe David Manana, >>> [email protected] >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men." >>> >> >> >
