In the flow of the thinking ... i added a explanation for evoiding misunderstanding. I use TransactionId not for introduce transaction in lucene (a async commit excludes a traditional transaction system) but for signing segments with a extenal key (transactionid) , so if for a corruption error in index i cant find a segment 5 , searching segment 4 and 6 i can understand the range of foreign keys (transaction ids) to reload in lucene. So i can load in lucene all the documents missing realoding them for example from a database.
2017-03-23 10:53 GMT+01:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > Errata corridge/integration for questions related to previous my post > > I studied a bit this lucene classes for understanding: > 1) setCommitData is designed for versioning the index , not for passing a > transaction log. However if userdata is different for every transactionid > it is equivalent . > 2) NRT refresh automatically searcher/reader it dont call commit. I based > my implementation using nrt on http://stackoverflow.com/ > questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread > -sample-usage. In this example commit is executed for every crud > operation in synchronous way but in general it is advised to use a batch > thread because the commit is a long operation. *So it is not clear how to > do the commit in a near-real time system with a indefinite index size.* > 2.a if the commit is synchronous , i can use user data because it is > used before a commit, every commit has a different user data and i can > trace the transactions changes.But in general a commit can requires also > minutes for be completed so then it dont seams a real solution in a near > real time solution. > 2.b if the commit is async, it is executed every X times (or better > how memory if full) , the commit can not be used for tracing the > transactions and i can pass a trnsaction id associated with a lucene > commit. I can add a mutex in crud ( when i loading uncommit data) i m sure > the last uncummit Index is aligned to the last transaction id X, so there > is no overlappind and the crud block is very fast when happens.But how to > grant that the commit is related to the last CommitIndex what i loaded? > Maybe if i introduce that mutex in a custom mergePolicy? > It is right what i wrote until now ?The best solution is 2.b? In this case > how to grant the commit is done based on the uncommit data loaded in a > specific commitIndex? > > > > > > 2017-03-22 15:32 GMT+01:00 Michael McCandless <luc...@mikemccandless.com>: > >> Hi, I think you forgot to CC the lucene user's list ( >> java-user@lucene.apache.org) in your reply? Can you resend? >> >> Thanks. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto < >> cristian.lorenze...@gmail.com> wrote: >> >>> hi , i m thinking about what you told me in previous message and how to >>> solve the corruption problem and the problem about commit operation >>> executed in async way. >>> >>> I m thinking to create a simple transaction log in a file. >>> i use a long atomic sequence for a ordinable transaction id. >>> >>> when i make a new operation >>> 1) generate new incremental transaction id >>> 2) save the operation abstract info in transaction log associated to id. >>> 2.a insert ,update with the a serialized version of the object to >>> save >>> 2b delete the query serialized where apply delete >>> 3) execute same operation in lucene adding before property transactionId >>> (executed in ram) >>> >>> 4) in async way commit is executed. After the commit the transaction log >>> until last transaction id is deleted.(i dont know how insert block after >>> commit , using near real time reader and SearcherManager) I might >>> introduce a logic in the way a commit is done. The order is simlilar to a >>> queue so it follows the transactionId order. i Is there a example about >>> possibility to commit a specific set of uncommit operations? >>> >>> 5) i need the warrenty after a crud operation the data in available in >>> memory in a possible imminent research so i think i might execute >>> flush/refreshReader after every CUD operations >>> >>> if there is a failure transaction log will be not empty. But i can >>> rexecute operations not executed after restartup. >>> Maybe it could be usefull also for fixing a corruption but it is sure >>> the corrution dont touch also segments already commited completely in the >>> past? or maybe for a stable solution i might anyway save data in a >>> secondary repository ? >>> >>> >>> >>> for your opinion this solution will be sufficient . It is a good >>> solution for you, i m forgetting some aspects? >>> >>> PS Another interesting aspect maybe could be associate the segment >>> associated to a transaction. In this way if a segment is missing i can >>> apply again it without rebuild all the index from scratch. >>> >>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <luc...@mikemccandless.com> >>> : >>> >>>> You can use Lucene's CheckIndex tool with the -exorcise option but this >>>> is quite brutal: it simply drops any segment that has corruption it >>>> detects. >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <m...@marcoreis.net> wrote: >>>> >>>>> I'm afraid it's not possible to rebuild index. It's important to >>>>> maintain a >>>>> backup policy because of that. >>>>> >>>>> >>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto < >>>>> cristian.lorenze...@gmail.com> wrote: >>>>> >>>>> > lucene can rebuild index using his internal info and how ? or in >>>>> have to >>>>> > reinsert all in other way? >>>>> > >>>>> -- >>>>> Marco Reis >>>>> Software Architect >>>>> http://marcoreis.net >>>>> https://github.com/masreis >>>>> +55 61 9 81194620 >>>>> >>>> >>>> >>> >> >