On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote: > Right, and jchris' db_repair branch includes my patches for DB reader _admin > access and a more useful progress report in the replication phase of the > repair. >
I've updated the repair branch with everyone's code. I think it is faster, due to Adam's idea that if we run the merges in reverse order, those near the front of the file are more likely to be no-ops, so less work is done over all. Mikeal will be testing for correctness. Could other's please use it and test for usability as well. Latest code (with instructions) is here: http://github.com/jhs/recover-couchdb/ Which points at http://github.com/jchris/couchdb/tree/db_repair for the repair code. One thing I am not clear about (need better docs) is, do we need to replicate the original db to the lost+found db (or vice-versa), after recovery is complete? Chris > Adam > > On Aug 12, 2010, at 3:14 PM, Jason Smith wrote: > >> The code is updated with the following changes: >> 1. Adhere to the lost+found/databasename custom... >> 2. ...except databases starting with _, which goes into >> _system/databasename >> 3. Sync up with jchris's db_repair branch >> >> (About #2, I started with _/database but I think it's too easy to miss at >> the command line.) >> >> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <[email protected]> wrote: >> >>> A few bug reports from my testing: >>> >>> I launched with this command, as specified in the README: >>> >>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec ./recover_couchdb >>> {} \; >>> >>> >>> >>> First of all, it chokes on my _users and _replicator db: >>> >>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at 0 >>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause, >>> {error,illegal_database_name}} >>> >>> That second [error] line is repeated many many times (once per merge I >>> think). I think the issue is that _users is hard-coded to be OK, but >>> _users_lost+found is not. So we should do something about that, maybe if a >>> db-name starts with _ we should call the lost and found a_users_lost+found >>> (_ sorts at the top, so "a" will be near it and legal). >>> >>> >>> >>> When a database has readers defined in the security object, the tool is >>> unable to open them (the reading part of the repair tool needs to have the >>> _admin userCtx, not just the writer). >>> >>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined} vs >>> Names [<<"joe">>] Roles [<<"_admin">>] >>> escript: exception throw: {unauthorized,<<"You are not authorized to access >>> this db.">>} >>> in function couch_db:open/2 >>> in call from couch_db_repair:make_lost_and_found/3 >>> in call from recover_couchdb:main/1 >>> in call from escript:run/2 >>> in call from escript:start/1 >>> in call from init:start_it/1 >>> in call from init:start_em/1 >>> >>> >>> It would also be helpful if the status lines could say something more than >>> >>> [info] [<0.2.0>] couch_db_repair writing 15 updates to bench_lost+found >>> >>> Like maybe add a note like "about 23% complete" if at all possible. >>> >>> >>> I will patch the first few, I'd love help from someone on the last one. >>> I'll be on IRC. >>> >>> >>> Cheers, >>> Chris >>> >>> >>> >>> >>> >>> >>> >>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote: >>> >>>> >>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote: >>>> >>>>> Hi, Jason. >>>>> >>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <[email protected]> wrote: >>>>> >>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <[email protected]> >>> wrote: >>>>>> >>>>>>> Excellent, thanks for testing. I caught Jason Smith saying on IRC >>> that he >>>>>>> had packaged the whole thing up as an escript + some .beams. If we >>> can get >>>>>>> it down to a single file a la rebar that would be a pretty sweet way >>> to >>>>>>> deliver the repair tool in my opinion. >>>>>>> >>>>>> >>>>>> Please check out http://github.com/jhs/repair-couchdb >>>>>> >>>>> >>>>> I think you mean http://github.com/jhs/recover-couchdb >>>>> >>>> >>>> I think it is important that we package and release this, if it is ready. >>> We should link to it from the bug description page, the project home page, >>> as well as blog about it, etc. What is the point of working feverishly on a >>> recovery tool if we don't go the last mile? >>>> >>>> I am testing it now on my database directory to make sure it doesn't harm >>> anything (I was never subject to the bug, which is probably where most >>> people are, but they might run it anyway.) >>>> >>>> As it stands the submodules thing can't be part of the release, we need >>> to package it up as a single zip file or something. >>>> >>>> Is there anything else that needs to be done before we can release this? >>>> >>>> Chris >>>> >>>>> -- >>>>> Jason Smith >>>>> Couchio Hosting >>>> >>> >>> >> >> >> -- >> Jason Smith >> Couchio Hosting >
