Yes exactly. I saw, working in the past in systems using lucene (for example alfresco projects), lucene corruption happens sometimes and every time the building requires a lot of times ... so i thougth a way for accelerating the fixing of a corruption index. In addition there is a rare case not described here ( If after a database commit lucene throws a exception for exampe disk is full ) there is a possibility of a disalignement from the database and the lucene index. With this system these problems could be solved automatically. In database every row has a property with trasaction id. So if i know in lucene is missing a segment 6 , corrisponds to transactions range[ 1000, 1050] so i can reload in a query in database just corrisponding rows.
2017-03-23 14:59 GMT+01:00 Michael McCandless <luc...@mikemccandless.com>: > You should be able to use the sequence numbers returned by IndexWriter > operations to "know" which operations made it into the commit and which did > not, and then on disaster recovery replay only those operations that didn't > make it? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Mar 23, 2017 at 5:53 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > >> Errata corridge/integration for questions related to previous my post >> >> I studied a bit this lucene classes for understanding: >> 1) setCommitData is designed for versioning the index , not for passing a >> transaction log. However if userdata is different for every transactionid >> it is equivalent . >> 2) NRT refresh automatically searcher/reader it dont call commit. I based >> my implementation using nrt on http://stackoverflow.com/qu >> estions/17993960/lucene-4-4-0-new-controlledrealtimereopenth >> read-sample-usage. In this example commit is executed for every crud >> operation in synchronous way but in general it is advised to use a batch >> thread because the commit is a long operation. *So it is not clear how >> to do the commit in a near-real time system with a indefinite index size.* >> 2.a if the commit is synchronous , i can use user data because it is >> used before a commit, every commit has a different user data and i can >> trace the transactions changes.But in general a commit can requires also >> minutes for be completed so then it dont seams a real solution in a near >> real time solution. >> 2.b if the commit is async, it is executed every X times (or better >> how memory if full) , the commit can not be used for tracing the >> transactions and i can pass a trnsaction id associated with a lucene >> commit. I can add a mutex in crud ( when i loading uncommit data) i m sure >> the last uncummit Index is aligned to the last transaction id X, so there >> is no overlappind and the crud block is very fast when happens.But how to >> grant that the commit is related to the last CommitIndex what i loaded? >> Maybe if i introduce that mutex in a custom mergePolicy? >> It is right what i wrote until now ?The best solution is 2.b? In this >> case how to grant the commit is done based on the uncommit data loaded in a >> specific commitIndex? >> >> >> >> >> >> 2017-03-22 15:32 GMT+01:00 Michael McCandless <luc...@mikemccandless.com> >> : >> >>> Hi, I think you forgot to CC the lucene user's list ( >>> java-user@lucene.apache.org) in your reply? Can you resend? >>> >>> Thanks. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto < >>> cristian.lorenze...@gmail.com> wrote: >>> >>>> hi , i m thinking about what you told me in previous message and how to >>>> solve the corruption problem and the problem about commit operation >>>> executed in async way. >>>> >>>> I m thinking to create a simple transaction log in a file. >>>> i use a long atomic sequence for a ordinable transaction id. >>>> >>>> when i make a new operation >>>> 1) generate new incremental transaction id >>>> 2) save the operation abstract info in transaction log associated to >>>> id. >>>> 2.a insert ,update with the a serialized version of the object to >>>> save >>>> 2b delete the query serialized where apply delete >>>> 3) execute same operation in lucene adding before property >>>> transactionId (executed in ram) >>>> >>>> 4) in async way commit is executed. After the commit the transaction >>>> log until last transaction id is deleted.(i dont know how insert block >>>> after commit , using near real time reader and SearcherManager) I might >>>> introduce a logic in the way a commit is done. The order is simlilar to a >>>> queue so it follows the transactionId order. i Is there a example about >>>> possibility to commit a specific set of uncommit operations? >>>> >>>> 5) i need the warrenty after a crud operation the data in available in >>>> memory in a possible imminent research so i think i might execute >>>> flush/refreshReader after every CUD operations >>>> >>>> if there is a failure transaction log will be not empty. But i can >>>> rexecute operations not executed after restartup. >>>> Maybe it could be usefull also for fixing a corruption but it is sure >>>> the corrution dont touch also segments already commited completely in the >>>> past? or maybe for a stable solution i might anyway save data in a >>>> secondary repository ? >>>> >>>> >>>> >>>> for your opinion this solution will be sufficient . It is a good >>>> solution for you, i m forgetting some aspects? >>>> >>>> PS Another interesting aspect maybe could be associate the segment >>>> associated to a transaction. In this way if a segment is missing i can >>>> apply again it without rebuild all the index from scratch. >>>> >>>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <luc...@mikemccandless.com >>>> >: >>>> >>>>> You can use Lucene's CheckIndex tool with the -exorcise option but >>>>> this is quite brutal: it simply drops any segment that has corruption it >>>>> detects. >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <m...@marcoreis.net> wrote: >>>>> >>>>>> I'm afraid it's not possible to rebuild index. It's important to >>>>>> maintain a >>>>>> backup policy because of that. >>>>>> >>>>>> >>>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto < >>>>>> cristian.lorenze...@gmail.com> wrote: >>>>>> >>>>>> > lucene can rebuild index using his internal info and how ? or in >>>>>> have to >>>>>> > reinsert all in other way? >>>>>> > >>>>>> -- >>>>>> Marco Reis >>>>>> Software Architect >>>>>> http://marcoreis.net >>>>>> https://github.com/masreis >>>>>> +55 61 9 81194620 >>>>>> >>>>> >>>>> >>>> >>> >> >