With Randall's help we hooked the new node scanner up to the lost+found DB generator. It seems to work well enough for small DBs; for large DBs with lots of missing nodes the O(N^2) complexity of the problem catches up to the code and generating the lost+found DB takes quite some time. Mikeal is running tests tonight. The algo appears pretty CPU-limited, so a little parallelization may be warranted.
http://github.com/kocolosk/couchdb/tree/db_repair Adam (I sent this previous update to myself instead of the list, so I'll forward it here ...) On Aug 10, 2010, at 12:01 AM, Adam Kocoloski wrote: > On Aug 9, 2010, at 10:10 PM, Adam Kocoloski wrote: > >> Right, make_lost_and_found still relies on code which reads through >> couch_file one byte at a time, that's the cause of the slowness. The newer >> scanner will improve that pretty dramatically, and we can tune it further by >> increasing the length of the pattern that we match when looking for >> kp/kv_node terms in the files, at the expense of some extra complexity >> dealing with the block prefixes (currently it does a 1-byte match, which as >> I understand it cannot be split across blocks). > > The scanner now looks for a 7 byte match, unless it is within 6 bytes of a > block boundary, in which case it looks for the longest possible match at that > position. The more specific match condition greatly reduces the # of calls > to couch_file, and thus boosts the throughput. On my laptop it can scan the > testwritesdb.couch from Mikeal's couchtest repo (52 MB) in 18 seconds. > >> Regarding the file_corruption error on the larger file, I think this is >> something we will just naturally trigger when we take a guess that random >> positions in a file are actually the beginning of a term. I think our best >> recourse here is to return {error, file_corruption} from couch_file but >> leave the gen_server up and running instead of terminating it. That way the >> repair code can ignore the error and keep moving without having to reopen >> the file. > > I committed this change (to my db_repair branch) after consulting with Chris. > The longer match condition makes these spurious file_corruption triggers > much less likely, but I think it's still a good thing not to crash the server > when they happen. > >> Next steps as I understand them - Randall is working on integrating the >> in-memory scanner into Volker's code that finds all the dangling by_id >> nodes. I'm working on making sure that the scanner identifies bt node >> candidates which span block prefixes, and on improving its pattern-matching. > > Latest from my end > http://github.com/kocolosk/couchdb/tree/db_repair > >> >> Adam >> >> On Aug 9, 2010, at 9:50 PM, Mikeal Rogers wrote: >> >>> I pulled down the latest code from Adam's branch @ >>> 7080ff72baa329cf6c4be2a79e71a41f744ed93b. >>> >>> Running timer:tc(couch_db_repair, make_lost_and_found, ["multi_conflict"]). >>> on a database with 200 lost updates spanning 200 restarts ( >>> http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch ) took >>> about 101 seconds. >>> >>> I tried running against a larger databases ( >>> http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) and I >>> got this exception: >>> >>> http://gist.github.com/516491 >>> >>> -Mikeal >>> >>> >>> >>> On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds >>> <[email protected]>wrote: >>> >>>> Summing up what went on in IRC for those who were absent. >>>> >>>> The latest progress is on Adam's branch at >>>> http://github.com/kocolosk/couchdb/tree/db_repair >>>> >>>> couch_db_repair:make_lost_and_found/1 attempts to create a new >>>> lost+found/DbName database to which it merges all nodes not accessible >>>> from anywhere (any other node found in a full file scan or any header >>>> pointers). >>>> >>>> Currently, make_lost_and_found uses Volker's repair (from >>>> couch_db_repair_b module, also in Adam's branch). >>>> Adam found that the bottleneck was couch_file calls and that the >>>> repair process was taking a very long time so he added >>>> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as binary >>>> and tries to process it to find nodes instead of scanning back one >>>> byte at a time. It is currently not hooked up to the repair mechanism. >>>> >>>> Making progress. Go team. >>>> >>>> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <[email protected]> >>>> wrote: >>>>> jchris suggested on IRC that I try a normal doc update and see if that >>>> fixes >>>>> it. >>>>> >>>>> It does. After a new doc was created the dbinfo doc count was back to >>>>> normal. >>>>> >>>>> -Mikeal >>>>> >>>>> On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <[email protected] >>>>> wrote: >>>>> >>>>>> Ok, I pulled down this code and tested against a database with a ton of >>>>>> missing writes right before a single restart. >>>>>> >>>>>> Before restart this was the database: >>>>>> >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 124969 >>>>>> doc_del_count: 0 >>>>>> update_seq: 124969 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857478 >>>>>> instance_start_time: "1281384140058211" >>>>>> disk_format_version: 5 >>>>>> } >>>>>> >>>>>> After restart it was this: >>>>>> >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 1 >>>>>> doc_del_count: 0 >>>>>> update_seq: 1 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857478 >>>>>> instance_start_time: "1281384593876026" >>>>>> disk_format_version: 5 >>>>>> } >>>>>> >>>>>> After repair, it's this: >>>>>> >>>>>> { >>>>>> db_name: "testwritesdb" >>>>>> doc_count: 1 >>>>>> doc_del_count: 0 >>>>>> update_seq: 124969 >>>>>> purge_seq: 0 >>>>>> compact_running: false >>>>>> disk_size: 54857820 >>>>>> instance_start_time: "1281385990193289" >>>>>> disk_format_version: 5 >>>>>> committed_update_seq: 124969 >>>>>> } >>>>>> >>>>>> All the sequences are there and hitting _all_docs shows all the >>>> documents >>>>>> so why is the doc_count only 1 in the dbinfo? >>>>>> >>>>>> -Mikeal >>>>>> >>>>>> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana < >>>> [email protected]>wrote: >>>>>> >>>>>>> For the record (and people not on IRC), the code at: >>>>>>> >>>>>>> http://github.com/fdmanana/couchdb/commits/db_repair >>>>>>> >>>>>>> is working for at least simple cases. Use >>>>>>> couch_db_repair:repair(DbNameAsString). >>>>>>> There's one TODO: update the reduce values for the by_seq and by_id >>>>>>> BTrees. >>>>>>> >>>>>>> If anyone wants to give some help on this, your welcome. >>>>>>> >>>>>>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <[email protected] >>>>>>>> wrote: >>>>>>> >>>>>>>> I'm starting to create a bunch of test db files that expose this bug >>>>>>> under >>>>>>>> different conditions like multiple restarts, across compaction, >>>>>>> variances >>>>>>>> in >>>>>>>> updates the might cause conflict, etc. >>>>>>>> >>>>>>>> http://github.com/mikeal/couchtest >>>>>>>> >>>>>>>> The README outlines what was done to the db's and what needs to be >>>>>>>> recovered. >>>>>>>> >>>>>>>> -Mikeal >>>>>>>> >>>>>>>> On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana < >>>>>>> [email protected] >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson < >>>>>>> [email protected] >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Doesn't this bit; >>>>>>>>>> >>>>>>>>>> - Db#db{waiting_delayed_commit=nil}; >>>>>>>>>> + Db; >>>>>>>>>> + % Db#db{waiting_delayed_commit=nil}; >>>>>>>>>> >>>>>>>>>> revert the bug fix? >>>>>>>>>> >>>>>>>>> >>>>>>>>> That's intentional, for my local testing. >>>>>>>>> That patch isn't obviously anything close to final, it's too >>>>>>> experimental >>>>>>>>> yet. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> B. >>>>>>>>>> >>>>>>>>>> On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <[email protected]> >>>>>>> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Filipe jumped in to start working on the recovery tool, but he >>>>>>> isn't >>>>>>>>> done >>>>>>>>>> yet. >>>>>>>>>>> >>>>>>>>>>> Here's the current patch: >>>>>>>>>>> >>>>>>>>>>> http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz >>>>>>>>>>> >>>>>>>>>>> it is not done and very early, but any help on this is greatly >>>>>>>>>> appreciated. >>>>>>>>>>> >>>>>>>>>>> The current state is (in Filipe's words): >>>>>>>>>>> - i can detect that a file needs repair >>>>>>>>>>> - and get the last btree roots from it >>>>>>>>>>> - "only" missing: get last db seq num >>>>>>>>>>> - write new header >>>>>>>>>>> - and deal with the local docs btree (if exists) >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> Jan >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Filipe David Manana, >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>>> "Reasonable men adapt themselves to the world. >>>>>>>>> Unreasonable men adapt the world to themselves. >>>>>>>>> That's why all progress depends on unreasonable men." >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Filipe David Manana, >>>>>>> [email protected] >>>>>>> >>>>>>> "Reasonable men adapt themselves to the world. >>>>>>> Unreasonable men adapt the world to themselves. >>>>>>> That's why all progress depends on unreasonable men." >>>>>>> >>>>>> >>>>>> >>>>> >>>> >> >
