On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote: > > On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote: > >> Right, and jchris' db_repair branch includes my patches for DB reader _admin >> access and a more useful progress report in the replication phase of the >> repair. >> > > I've updated the repair branch with everyone's code. I think it is faster, > due to Adam's idea that if we run the merges in reverse order, those near the > front of the file are more likely to be no-ops, so less work is done over all. > > Mikeal will be testing for correctness. Could other's please use it and test > for usability as well. Latest code (with instructions) is here: > > http://github.com/jhs/recover-couchdb/ > > Which points at http://github.com/jchris/couchdb/tree/db_repair for the > repair code. > > One thing I am not clear about (need better docs) is, do we need to replicate > the original db to the lost+found db (or vice-versa), after recovery is > complete? >
Also, we should be clear about what the semantics for this are. It can potentially introduce conflicts if some writes were repeated after restarts. Should it always be a noop on dbs that are clean w/r/t the bug? Chris > Chris > >> Adam >> >> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote: >> >>> The code is updated with the following changes: >>> 1. Adhere to the lost+found/databasename custom... >>> 2. ...except databases starting with _, which goes into >>> _system/databasename >>> 3. Sync up with jchris's db_repair branch >>> >>> (About #2, I started with _/database but I think it's too easy to miss at >>> the command line.) >>> >>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <[email protected]> wrote: >>> >>>> A few bug reports from my testing: >>>> >>>> I launched with this command, as specified in the README: >>>> >>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec ./recover_couchdb >>>> {} \; >>>> >>>> >>>> >>>> First of all, it chokes on my _users and _replicator db: >>>> >>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at 0 >>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause, >>>> {error,illegal_database_name}} >>>> >>>> That second [error] line is repeated many many times (once per merge I >>>> think). I think the issue is that _users is hard-coded to be OK, but >>>> _users_lost+found is not. So we should do something about that, maybe if a >>>> db-name starts with _ we should call the lost and found a_users_lost+found >>>> (_ sorts at the top, so "a" will be near it and legal). >>>> >>>> >>>> >>>> When a database has readers defined in the security object, the tool is >>>> unable to open them (the reading part of the repair tool needs to have the >>>> _admin userCtx, not just the writer). >>>> >>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined} vs >>>> Names [<<"joe">>] Roles [<<"_admin">>] >>>> escript: exception throw: {unauthorized,<<"You are not authorized to access >>>> this db.">>} >>>> in function couch_db:open/2 >>>> in call from couch_db_repair:make_lost_and_found/3 >>>> in call from recover_couchdb:main/1 >>>> in call from escript:run/2 >>>> in call from escript:start/1 >>>> in call from init:start_it/1 >>>> in call from init:start_em/1 >>>> >>>> >>>> It would also be helpful if the status lines could say something more than >>>> >>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to bench_lost+found >>>> >>>> Like maybe add a note like "about 23% complete" if at all possible. >>>> >>>> >>>> I will patch the first few, I'd love help from someone on the last one. >>>> I'll be on IRC. >>>> >>>> >>>> Cheers, >>>> Chris >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote: >>>> >>>>> >>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote: >>>>> >>>>>> Hi, Jason. >>>>>> >>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <[email protected]> wrote: >>>>>> >>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> Excellent, thanks for testing. I caught Jason Smith saying on IRC >>>> that he >>>>>>>> had packaged the whole thing up as an escript + some .beams. If we >>>> can get >>>>>>>> it down to a single file a la rebar that would be a pretty sweet way >>>> to >>>>>>>> deliver the repair tool in my opinion. >>>>>>>> >>>>>>> >>>>>>> Please check out http://github.com/jhs/repair-couchdb >>>>>>> >>>>>> >>>>>> I think you mean http://github.com/jhs/recover-couchdb >>>>>> >>>>> >>>>> I think it is important that we package and release this, if it is ready. >>>> We should link to it from the bug description page, the project home page, >>>> as well as blog about it, etc. What is the point of working feverishly on a >>>> recovery tool if we don't go the last mile? >>>>> >>>>> I am testing it now on my database directory to make sure it doesn't harm >>>> anything (I was never subject to the bug, which is probably where most >>>> people are, but they might run it anyway.) >>>>> >>>>> As it stands the submodules thing can't be part of the release, we need >>>> to package it up as a single zip file or something. >>>>> >>>>> Is there anything else that needs to be done before we can release this? >>>>> >>>>> Chris >>>>> >>>>>> -- >>>>>> Jason Smith >>>>>> Couchio Hosting >>>>> >>>> >>>> >>> >>> >>> -- >>> Jason Smith >>> Couchio Hosting >> >
