I tested the latest code in recover-couchdb and it looks great. -Mikeal
On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <[email protected]> wrote: > > On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote: > > > > > On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote: > > > >> Right, and jchris' db_repair branch includes my patches for DB reader > _admin access and a more useful progress report in the replication phase of > the repair. > >> > > > > I've updated the repair branch with everyone's code. I think it is > faster, due to Adam's idea that if we run the merges in reverse order, those > near the front of the file are more likely to be no-ops, so less work is > done over all. > > > > Mikeal will be testing for correctness. Could other's please use it and > test for usability as well. Latest code (with instructions) is here: > > > > http://github.com/jhs/recover-couchdb/ > > > > Which points at http://github.com/jchris/couchdb/tree/db_repair for the > repair code. > > > > One thing I am not clear about (need better docs) is, do we need to > replicate the original db to the lost+found db (or vice-versa), after > recovery is complete? > > > > Also, we should be clear about what the semantics for this are. It can > potentially introduce conflicts if some writes were repeated after restarts. > Should it always be a noop on dbs that are clean w/r/t the bug? > > Chris > > > Chris > > > >> Adam > >> > >> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote: > >> > >>> The code is updated with the following changes: > >>> 1. Adhere to the lost+found/databasename custom... > >>> 2. ...except databases starting with _, which goes into > >>> _system/databasename > >>> 3. Sync up with jchris's db_repair branch > >>> > >>> (About #2, I started with _/database but I think it's too easy to miss > at > >>> the command line.) > >>> > >>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <[email protected]> > wrote: > >>> > >>>> A few bug reports from my testing: > >>>> > >>>> I launched with this command, as specified in the README: > >>>> > >>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec > ./recover_couchdb > >>>> {} \; > >>>> > >>>> > >>>> > >>>> First of all, it chokes on my _users and _replicator db: > >>>> > >>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at > 0 > >>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause, > >>>> {error,illegal_database_name}} > >>>> > >>>> That second [error] line is repeated many many times (once per merge I > >>>> think). I think the issue is that _users is hard-coded to be OK, but > >>>> _users_lost+found is not. So we should do something about that, maybe > if a > >>>> db-name starts with _ we should call the lost and found > a_users_lost+found > >>>> (_ sorts at the top, so "a" will be near it and legal). > >>>> > >>>> > >>>> > >>>> When a database has readers defined in the security object, the tool > is > >>>> unable to open them (the reading part of the repair tool needs to have > the > >>>> _admin userCtx, not just the writer). > >>>> > >>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined} > vs > >>>> Names [<<"joe">>] Roles [<<"_admin">>] > >>>> escript: exception throw: {unauthorized,<<"You are not authorized to > access > >>>> this db.">>} > >>>> in function couch_db:open/2 > >>>> in call from couch_db_repair:make_lost_and_found/3 > >>>> in call from recover_couchdb:main/1 > >>>> in call from escript:run/2 > >>>> in call from escript:start/1 > >>>> in call from init:start_it/1 > >>>> in call from init:start_em/1 > >>>> > >>>> > >>>> It would also be helpful if the status lines could say something more > than > >>>> > >>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to > bench_lost+found > >>>> > >>>> Like maybe add a note like "about 23% complete" if at all possible. > >>>> > >>>> > >>>> I will patch the first few, I'd love help from someone on the last > one. > >>>> I'll be on IRC. > >>>> > >>>> > >>>> Cheers, > >>>> Chris > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote: > >>>> > >>>>> > >>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote: > >>>>> > >>>>>> Hi, Jason. > >>>>>> > >>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <[email protected]> wrote: > >>>>>> > >>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <[email protected] > > > >>>> wrote: > >>>>>>> > >>>>>>>> Excellent, thanks for testing. I caught Jason Smith saying on IRC > >>>> that he > >>>>>>>> had packaged the whole thing up as an escript + some .beams. If > we > >>>> can get > >>>>>>>> it down to a single file a la rebar that would be a pretty sweet > way > >>>> to > >>>>>>>> deliver the repair tool in my opinion. > >>>>>>>> > >>>>>>> > >>>>>>> Please check out http://github.com/jhs/repair-couchdb > >>>>>>> > >>>>>> > >>>>>> I think you mean http://github.com/jhs/recover-couchdb > >>>>>> > >>>>> > >>>>> I think it is important that we package and release this, if it is > ready. > >>>> We should link to it from the bug description page, the project home > page, > >>>> as well as blog about it, etc. What is the point of working feverishly > on a > >>>> recovery tool if we don't go the last mile? > >>>>> > >>>>> I am testing it now on my database directory to make sure it doesn't > harm > >>>> anything (I was never subject to the bug, which is probably where most > >>>> people are, but they might run it anyway.) > >>>>> > >>>>> As it stands the submodules thing can't be part of the release, we > need > >>>> to package it up as a single zip file or something. > >>>>> > >>>>> Is there anything else that needs to be done before we can release > this? > >>>>> > >>>>> Chris > >>>>> > >>>>>> -- > >>>>>> Jason Smith > >>>>>> Couchio Hosting > >>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Jason Smith > >>> Couchio Hosting > >> > > > >
