Found one issue, we weren't picking up design docs because it didn't have admin privileges.
Adam fixed it and pushed and I've verified that it works now. I wrote a little node script to show all recovered documents and expose any documents that didn't make it in to lost+found. http://github.com/mikeal/couchtest/blob/master/validate.js Requires request, `npm install request`. I'm now running recover on all the test db's I have and running the validation script against them. -Mikeal On Tue, Aug 10, 2010 at 1:34 PM, Mikeal Rogers <[email protected]>wrote: > I have some timing number for the new code. > > multi_conflict has 200 lost documents and 201 documents total after > recovery. > 1> timer:tc(couch_db_repair, make_lost_and_found, ["multi_conflict"]). > {25217069,ok} > 25 seconds > > Something funky is going on here. Investigating. > 1> timer:tc(couch_db_repair, make_lost_and_found, > ["multi_conflict_with_attach"]). > {654782,ok} > .6 seconds > > This db has 124969 documents in it. > 1> timer:tc(couch_db_repair, make_lost_and_found, ["testwritesdb"]). > {1381969304,ok} > 23 minutes > > This database is about 500megs and 46660 before recovery and 46801 after. > 1> timer:tc(couch_db_repair, make_lost_and_found, ["prod"]). > {2329669113,ok} > 38.8 minutes > > -Mikeal > > On Tue, Aug 10, 2010 at 12:06 PM, Adam Kocoloski <[email protected]>wrote: > >> Good idea. Now we've got >> >> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 >> bytes at 1380102 >> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 >> bytes at 331526 >> > [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 331526 >> bytes at 0 >> > [info] [<0.33.0>] couch_db_repair writing 12 updates to >> lost+found/testwritesdb >> > [info] [<0.33.0>] couch_db_repair writing 9 updates to >> lost+found/testwritesdb >> > [info] [<0.33.0>] couch_db_repair writing 8 updates to >> lost+found/testwritesdb >> >> Adam >> >> On Aug 10, 2010, at 2:29 PM, Robert Newson wrote: >> >> > It took 20 minutes before the first 'update' line came out, but now >> > seems to be recovering smoothly. machine load is back down to sane >> > levels. >> > >> > Suggest feedback during the hunting phase. >> > >> > B. >> > >> > On Tue, Aug 10, 2010 at 7:11 PM, Adam Kocoloski <[email protected]> >> wrote: >> >> Thanks for the crosscheck. I'm not aware of anything in the node >> finder that would cause it to struggle mightily with healthy DBs. It pretty >> much ignores the health of the DB, in fact. Would be interested to hear >> more. >> >> >> >> On Aug 10, 2010, at 1:59 PM, Robert Newson wrote: >> >> >> >>> I verified the new code's ability to repair the testwritesdb. system >> >>> load was smooth from start to finish. >> >>> >> >>> I started a further test on a different (healthy) database and system >> >>> load was severe again, just collecting the roots (the lost+found db >> >>> was not yet created when I aborted the attempt). I suspect the fact >> >>> that it's healthy is the issue, so if I'm right, perhaps a warning is >> >>> useful. >> >>> >> >>> B. >> >>> >> >>> >> >>> >> >>> On Tue, Aug 10, 2010 at 6:53 PM, Adam Kocoloski <[email protected]> >> wrote: >> >>>> Another update. This morning I took a different tack and, rather >> than try to find root nodes, I just looked for all kv_nodes in the file and >> treated each of those as a separate virtual DB to be replicated. This >> reduces the algorithmic complexity of the repair, and it looks like >> testwritesdb repairs in ~30 minutes or so. Also, this method results in the >> lost+found DB containing every document, not just the missing ones. >> >>>> >> >>>> My branch does not currently include Randall's parallelization of the >> replications. It's still CPU-limited, so that may be a worthwhile >> optimization. On the other hand, I think we may be reaching a stage at >> which performance for this repair tool is 'good enough', and pmaps can make >> error handling a bit dicey. >> >>>> >> >>>> In short, I think this tool is now in good shape. >> >>>> >> >>>> http://github.com/kocolosk/couchdb/tree/db_repair >> >>>> >> >> >> >> >> >> >
