Re: data recovery tool progress

J Chris Anderson Thu, 12 Aug 2010 14:16:02 -0700

On Aug 12, 2010, at 12:36 PM, Adam Kocoloski wrote:

> Right, and jchris' db_repair branch includes my patches for DB reader _admin 
> access and a more useful progress report in the replication phase of the 
> repair.
>


I've updated the repair branch with everyone's code. I think it is faster, due 
to Adam's idea that if we run the merges in reverse order, those near the front 
of the file are more likely to be no-ops, so less work is done over all.

Mikeal will be testing for correctness. Could other's please use it and test 
for usability as well. Latest code (with instructions) is here:

http://github.com/jhs/recover-couchdb/

Which points at http://github.com/jchris/couchdb/tree/db_repair for the repair 
code.

One thing I am not clear about (need better docs) is, do we need to replicate 
the original db to the lost+found db (or vice-versa), after recovery is 
complete?

Chris

> Adam
> 
> On Aug 12, 2010, at 3:14 PM, Jason Smith wrote:
> 
>> The code is updated with the following changes:
>> 1. Adhere to the lost+found/databasename custom...
>> 2. ...except databases starting with _, which goes into
>> _system/databasename
>> 3. Sync up with jchris's db_repair branch
>> 
>> (About #2, I started with _/database but I think it's too easy to miss at
>> the command line.)
>> 
>> On Fri, Aug 13, 2010 at 00:52, J Chris Anderson <[email protected]> wrote:
>> 
>>> A few bug reports from my testing:
>>> 
>>> I launched with this command, as specified in the README:
>>> 
>>> find ~/code/couchdb/tmp/lib -type f -name '*.couch' -exec ./recover_couchdb
>>> {} \;
>>> 
>>> 
>>> 
>>> First of all, it chokes on my _users and _replicator db:
>>> 
>>> [info] [<0.2.0>] couch_db_repair for _users - scanning 335961 bytes at 0
>>> [error] [<0.2.0>] couch_db_repair merge node at 332061 {case_clause,
>>>                                    {error,illegal_database_name}}
>>> 
>>> That second [error] line is repeated many many times (once per merge I
>>> think). I think the issue is that _users is hard-coded to be OK, but
>>> _users_lost+found is not. So we should do something about that, maybe if a
>>> db-name starts with _ we should call the lost and found a_users_lost+found
>>> (_ sorts at the top, so "a" will be near it and legal).
>>> 
>>> 
>>> 
>>> When a database has readers defined in the security object, the tool is
>>> unable to open them (the reading part of the repair tool needs to have the
>>> _admin userCtx, not just the writer).
>>> 
>>> [debug] [<0.2.0>] Not a reader: UserCtx {user_ctx,null,[],undefined} vs
>>> Names [<<"joe">>] Roles [<<"_admin">>]
>>> escript: exception throw: {unauthorized,<<"You are not authorized to access
>>> this db.">>}
>>> in function  couch_db:open/2
>>> in call from couch_db_repair:make_lost_and_found/3
>>> in call from recover_couchdb:main/1
>>> in call from escript:run/2
>>> in call from escript:start/1
>>> in call from init:start_it/1
>>> in call from init:start_em/1
>>> 
>>> 
>>> It would also be helpful if the status lines could say something more than
>>> 
>>> [info] [<0.2.0>] couch_db_repair writing 15 updates to bench_lost+found
>>> 
>>> Like maybe add a note like "about 23% complete" if at all possible.
>>> 
>>> 
>>> I will patch the first few, I'd love help from someone on the last one.
>>> I'll be on IRC.
>>> 
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Aug 12, 2010, at 10:18 AM, J Chris Anderson wrote:
>>> 
>>>> 
>>>> On Aug 11, 2010, at 2:14 PM, Jason Smith wrote:
>>>> 
>>>>> Hi, Jason.
>>>>> 
>>>>> On Thu, Aug 12, 2010 at 04:14, Jason Smith <[email protected]> wrote:
>>>>> 
>>>>>> On Wed, Aug 11, 2010 at 09:52, Adam Kocoloski <[email protected]>
>>> wrote:
>>>>>> 
>>>>>>> Excellent, thanks for testing.  I caught Jason Smith saying on IRC
>>> that he
>>>>>>> had packaged the whole thing up as an escript + some .beams.  If we
>>> can get
>>>>>>> it down to a single file a la rebar that would be a pretty sweet way
>>> to
>>>>>>> deliver the repair tool in my opinion.
>>>>>>> 
>>>>>> 
>>>>>> Please check out http://github.com/jhs/repair-couchdb
>>>>>> 
>>>>> 
>>>>> I think you mean http://github.com/jhs/recover-couchdb
>>>>> 
>>>> 
>>>> I think it is important that we package and release this, if it is ready.
>>> We should link to it from the bug description page, the project home page,
>>> as well as blog about it, etc. What is the point of working feverishly on a
>>> recovery tool if we don't go the last mile?
>>>> 
>>>> I am testing it now on my database directory to make sure it doesn't harm
>>> anything (I was never subject to the bug, which is probably where most
>>> people are, but they might run it anyway.)
>>>> 
>>>> As it stands the submodules thing can't be part of the release, we need
>>> to package it up as a single zip file or something.
>>>> 
>>>> Is there anything else that needs to be done before we can release this?
>>>> 
>>>> Chris
>>>> 
>>>>> --
>>>>> Jason Smith
>>>>> Couchio Hosting
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Jason Smith
>> Couchio Hosting
>

Re: data recovery tool progress

Reply via email to