Thanks Yonik, assuming I am not going to index ID , than only an option 4. remains so far. I have no other ideas, and Log* merge policy would mean all 4 Indexing magic went to nothing :)
Colud then the following do the job? clone DefaultIndexWriterProvider into my codebase (ugly, keep in sync , but doable) make it provide EnhancedSolrIndexWriter extends SolrIndexWriter @Override commit(...){ super.commit(Map<String, String> Core.getUserMap()); } the same with close(...) If yes, Is this feature something solr could use? Map<String, String> userParams somewhere in Core that gets committed with whatever it has at commit time. I could wrap up a patch by modifying SolrIndexWriter directly then? Nice thing about it, one could have possibility to keep small map of key value pairs in sync with commit points with all goods of TwoPhaseCommit... for "no way for this to get out of sync" things, like my use case below... I imagine DIH could use it as well --------------------------------------------------------- No longer... the default merge policy can now merge non-contiguous segments. You can of course still select a Log* merge policy, which never reorders ids with respect to each other. -Yonik http://www.lucidimagination.com ________________________________ From: eks dev <eks...@yahoo.co.uk> To: dev@lucene.apache.org Sent: Sat, 6 August, 2011 20:47:09 Subject: IndexReader.maxDoc() and other Assuming there are no deletes, would the following work as a way to load *last added document*, surviving optimize as well? Order of documentId-s in Lucene survives optimize as far as I remember? IndexReader ir... int maxDoc = ir.maxDoc() - 1; if(maxDoc>0) //? What is the return value on empty index, 0 or 1? Document d = ir.getDocument(maxDoc); Would this correspond to the last committed document (at commit point where index reader was opened) Or last added document, including pending/uncommitted (I am not getting IndexReader from the IndexWriter, no nrt yet...) The problem I am trying to solve are incremental updates (there are no deletions). Having unique, numerical uid stored in index that is increasing with every add, I just need a way to find max(uid) on the last commit to get my delta from the database. Above solution was one of the options. 2.The second would be to iterate TermsEnum for uid field until I hit an end, but this sounds slow (even if I start skipping around like a monkey)? 3.Third option would be to index reverse uid (HUGE_CONSTANT - uid), so it gets on top in terms dictionary? 4. And finally, the last option I am thinking of would be to track max(UID) and write it as a user Parameter with IndexWriter.commit(Map...), so I could read it easily (piggy-back on lucene commit is as safe as it gets, better then persisting own files...) I like the last option, but have no idea how to create beforeCommitListener in solr? The most robust is 2/3, but maybe slow-ish (there are 100-200Mio documents/UIDs) Any better ideas? (and no, DIH wall clock timestamp is not good enough) I am talking about solr/lucene 4 trunk, we decided to take a risk :) Thanks, eks