Re: data recovery tool progress

Mikeal Rogers Thu, 12 Aug 2010 23:39:27 -0700

I tested the latest code in recover-couchdb and it looks great.

-Mikeal


On Thu, Aug 12, 2010 at 2:33 PM, J Chris Anderson <[email protected]> wrote:

>
> On Aug 12, 2010, at 2:15 PM, J Chris Anderson wrote:
>
> >
> > On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:
> >
> >> Right, and jchris' db_repair branch includes my patches for DB reader
> _admin access and a more useful progress report in the replication phase of
> the repair.
> >>
> >
> > I've updated the repair branch with everyone's code. I think it is
> faster, due to Adam's idea that if we run the merges in reverse order, those
> near the front of the file are more likely to be no-ops, so less work is
> done over all.
> >
> > Mikeal will be testing for correctness. Could other's please use it and
> test for usability as well. Latest code (with instructions) is here:
> >
> > http://github.com/jhs/recover-couchdb/
> >
> > Which points at http://github.com/jchris/couchdb/tree/db_repair for the
> repair code.
> >
> > One thing I am not clear about (need better docs) is, do we need to
> replicate the original db to the lost+found db (or vice-versa), after
> recovery is complete?
> >
>
> Also, we should be clear about what the semantics for this are. It can
> potentially introduce conflicts if some writes were repeated after restarts.
> Should it always be a noop on dbs that are clean w/r/t the bug?
>
> Chris
>
> > Chris
> >
> >> Adam
> >>
> >> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
> >>
> >>> The code is updated with the following changes:
> >>> 1. Adhere to the lost+found/databasename custom...
> >>> 2. ...except databases starting with _, which goes into
> >>> _system/databasename
> >>> 3. Sync up with jchris's db_repair branch
> >>>
> >>> (About #2, I started with _/database but I think it's too easy to miss
> at
> >>> the command line.)
> >>>
> >>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <[email protected]>
> wrote:
> >>>
> >>>> A few bug reports from my testing:
> >>>>
> >>>> I launched with this command, as specified in the README:
> >>>>
> >>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec
> ./recover_couchdb
> >>>> {} \;
> >>>>
> >>>>
> >>>>
> >>>> First of all, it chokes on my _users and _replicator db:
> >>>>
> >>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at
> 0
> >>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause,
> >>>>                                   {error,illegal_database_name}}
> >>>>
> >>>> That second [error] line is repeated many many times (once per merge I
> >>>> think). I think the issue is that _users is hard-coded to be OK, but
> >>>> _users_lost+found is not. So we should do something about that, maybe
> if a
> >>>> db-name starts with _ we should call the lost and found
> a_users_lost+found
> >>>> (_ sorts at the top, so "a" will be near it and legal).
> >>>>
> >>>>
> >>>>
> >>>> When a database has readers defined in the security object, the tool
> is
> >>>> unable to open them (the reading part of the repair tool needs to have
> the
> >>>> _admin userCtx, not just the writer).
> >>>>
> >>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined}
> vs
> >>>> Names [<<"joe">>] Roles [<<"_admin">>]
> >>>> escript: exception throw: {unauthorized,<<"You are not authorized to
> access
> >>>> this db.">>}
> >>>> in function  couch_db:open/2
> >>>> in call from couch_db_repair:make_lost_and_found/3
> >>>> in call from recover_couchdb:main/1
> >>>> in call from escript:run/2
> >>>> in call from escript:start/1
> >>>> in call from init:start_it/1
> >>>> in call from init:start_em/1
> >>>>
> >>>>
> >>>> It would also be helpful if the status lines could say something more
> than
> >>>>
> >>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to
> bench_lost+found
> >>>>
> >>>> Like maybe add a note like "about 23% complete" if at all possible.
> >>>>
> >>>>
> >>>> I will patch the first few, I'd love help from someone on the last
> one.
> >>>> I'll be on IRC.
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
> >>>>
> >>>>>
> >>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
> >>>>>
> >>>>>> Hi, Jason.
> >>>>>>
> >>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <[email protected]> wrote:
> >>>>>>
> >>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <[email protected]
> >
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Excellent, thanks for testing.  I caught Jason Smith saying on IRC
> >>>> that he
> >>>>>>>> had packaged the whole thing up as an escript + some .beams.  If
> we
> >>>> can get
> >>>>>>>> it down to a single file a la rebar that would be a pretty sweet
> way
> >>>> to
> >>>>>>>> deliver the repair tool in my opinion.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Please check out http://github.com/jhs/repair-couchdb
> >>>>>>>
> >>>>>>
> >>>>>> I think you mean http://github.com/jhs/recover-couchdb
> >>>>>>
> >>>>>
> >>>>> I think it is important that we package and release this, if it is
> ready.
> >>>> We should link to it from the bug description page, the project home
> page,
> >>>> as well as blog about it, etc. What is the point of working feverishly
> on a
> >>>> recovery tool if we don't go the last mile?
> >>>>>
> >>>>> I am testing it now on my database directory to make sure it doesn't
> harm
> >>>> anything (I was never subject to the bug, which is probably where most
> >>>> people are, but they might run it anyway.)
> >>>>>
> >>>>> As it stands the submodules thing can't be part of the release, we
> need
> >>>> to package it up as a single zip file or something.
> >>>>>
> >>>>> Is there anything else that needs to be done before we can release
> this?
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>> --
> >>>>>> Jason Smith
> >>>>>> Couchio Hosting
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Jason Smith
> >>> Couchio Hosting
> >>
> >
>
>

Re: data recovery tool progress

Reply via email to