Re: how to rebuild a index corrupted?

Cristian Lorenzetto Thu, 23 Mar 2017 08:16:15 -0700

Yes exactly. I saw, working in the past in systems using lucene (for
example alfresco projects),  lucene corruption happens sometimes and every
time the building requires a lot of times ... so i thougth a way for
accelerating the fixing of a corruption index. In addition there is a rare
case not described here ( If after a database commit lucene throws a
exception for exampe disk is full ) there is a possibility of a
 disalignement from the database and the lucene index. With this system
these problems could be solved automatically. In database every row has a
property with trasaction id.  So if i know in lucene is missing a segment 6
, corrisponds to   transactions range[ 1000, 1050] so i can reload in a
query in database just corrisponding rows.


2017-03-23 14:59 GMT+01:00 Michael McCandless <[email protected]>:

> You should be able to use the sequence numbers returned by IndexWriter
> operations to "know" which operations made it into the commit and which did
> not, and then on disaster recovery replay only those operations that didn't
> make it?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Mar 23, 2017 at 5:53 AM, Cristian Lorenzetto <
> [email protected]> wrote:
>
>> Errata corridge/integration for questions related to previous my post
>>
>> I studied a bit this lucene classes for understanding:
>> 1) setCommitData is designed for versioning the index , not for passing a
>> transaction log. However if userdata is different for every transactionid
>> it is equivalent .
>> 2) NRT refresh automatically searcher/reader it dont call commit. I based
>> my implementation using nrt on http://stackoverflow.com/qu
>> estions/17993960/lucene-4-4-0-new-controlledrealtimereopenth
>> read-sample-usage. In this example commit is executed for every crud
>> operation in synchronous way but in general it is advised to use a batch
>> thread because the commit is a long operation. *So it is not clear how
>> to do the commit in a near-real time system with a indefinite index size.*
>>      2.a if the commit is synchronous , i can use user data because it is
>> used before a commit, every commit has a different user data and i can
>> trace the transactions changes.But in general a commit can requires also
>> minutes for be completed so then it dont seams a real solution in a near
>> real time solution.
>>     2.b if the commit is async, it is executed every X times (or better
>> how memory if full) , the commit can not be used for tracing the
>> transactions and i can pass a trnsaction id associated with a lucene
>> commit. I can add a mutex in crud ( when i loading uncommit data) i m sure
>> the last uncummit Index is aligned to the last transaction id X, so there
>> is no overlappind and the crud block is very fast when happens.But how to
>> grant that the commit is related to the last CommitIndex what i loaded?
>> Maybe if i introduce that mutex in a custom mergePolicy?
>> It is right what i wrote until now ?The best solution is 2.b? In this
>> case how to grant the commit is done based on the uncommit data loaded in a
>> specific commitIndex?
>>
>>
>>
>>
>>
>> 2017-03-22 15:32 GMT+01:00 Michael McCandless <[email protected]>
>> :
>>
>>> Hi, I think you forgot to CC the lucene user's list (
>>> [email protected]) in your reply?  Can you resend?
>>>
>>> Thanks.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto <
>>> [email protected]> wrote:
>>>
>>>> hi , i m thinking about what you told me in previous message and how to
>>>> solve the corruption problem and the problem about commit operation
>>>> executed in async way.
>>>>
>>>> I m thinking to create a simple transaction log in a file.
>>>> i use a long atomic sequence for a ordinable transaction id.
>>>>
>>>> when i make a new operation
>>>> 1) generate new incremental transaction id
>>>> 2) save the operation abstract info in transaction log associated to
>>>> id.
>>>>     2.a insert ,update with the a serialized version of the object to
>>>> save
>>>>     2b delete the query serialized where apply delete
>>>> 3) execute same operation in lucene adding before property
>>>> transactionId (executed in ram)
>>>>
>>>> 4) in async way commit is executed. After the commit the transaction
>>>> log until last transaction id is deleted.(i dont know how insert block
>>>> after commit , using near real time reader and SearcherManager) I might
>>>>  introduce a logic in the way a commit is done. The order is simlilar to a
>>>> queue so it follows the transactionId order. i Is there a example about
>>>> possibility to commit a specific set of uncommit operations?
>>>>
>>>> 5) i need the warrenty after a crud operation the data in available in
>>>> memory  in a possible imminent research so i think i might execute
>>>> flush/refreshReader after every CUD operations
>>>>
>>>> if there is a failure transaction log will be not empty. But i can
>>>> rexecute operations not executed after restartup.
>>>> Maybe it could be usefull also for fixing a corruption but it is sure
>>>> the corrution dont touch also segments already commited completely in the
>>>> past? or maybe for a stable solution i might anyway save data in a
>>>> secondary repository ?
>>>>
>>>>
>>>>
>>>> for your opinion this solution will be sufficient . It is a good
>>>> solution for you, i m forgetting some aspects?
>>>>
>>>> PS Another interesting aspect maybe could be associate the segment
>>>> associated to a transaction. In this way if a segment is missing i can
>>>> apply again it without rebuild all the index from scratch.
>>>>
>>>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <[email protected]
>>>> >:
>>>>
>>>>> You can use Lucene's CheckIndex tool with the -exorcise option but
>>>>> this is quite brutal: it simply drops any segment that has corruption it
>>>>> detects.
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <[email protected]> wrote:
>>>>>
>>>>>> I'm afraid it's not possible to rebuild index. It's important to
>>>>>> maintain a
>>>>>> backup policy because of that.
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> > lucene can rebuild index using his internal info and how ? or in
>>>>>> have to
>>>>>> > reinsert all in other way?
>>>>>> >
>>>>>> --
>>>>>> Marco Reis
>>>>>> Software Architect
>>>>>> http://marcoreis.net
>>>>>> https://github.com/masreis
>>>>>> +55 61 9 81194620
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: how to rebuild a index corrupted?

Reply via email to