Assuming there are no deletes,  would the following work as a way to load *last 
added document*, surviving optimize as well? 
Order of documentId-s in Lucene survives optimize as far as I remember? 

IndexReader ir...
int maxDoc = ir.maxDoc() - 1;
if(maxDoc>0) //? What is the return value on empty index, 0 or 1?
Document d = ir.getDocument(maxDoc);

Would this correspond to the last committed document (at commit point where 
index reader was opened)

Or last added document, including pending/uncommitted (I am not getting 
IndexReader from the IndexWriter, no nrt yet...)


The problem I am trying to solve are incremental updates (there are no 
deletions). Having unique, numerical uid stored in index that is increasing 
with 
every add, I just need a way to find max(uid) on the last commit to get my 
delta 
from the database.

Above solution was one of the options. 
2.The second would be to iterate TermsEnum for uid field until I hit an end, 
but 
this sounds slow (even if I start skipping around like a monkey)? 

3.Third option would be to index reverse uid  (HUGE_CONSTANT - uid), so it gets 
on top in terms dictionary?  

4. And finally, the last option I am thinking of would be to track max(UID) and 
write it as a user Parameter with  IndexWriter.commit(Map...), so I could read 
it easily (piggy-back on lucene commit is as safe as it gets, better then 
persisting own files...) 

I like the last option, but have no idea how to create beforeCommitListener in 
solr?   


The most robust is 2/3, but maybe slow-ish (there are 100-200Mio documents/UIDs)

Any better ideas? (and no, DIH wall clock timestamp is not good enough)

I am talking about solr/lucene 4 trunk, we decided to take a risk :) 
 
Thanks, 
eks

Reply via email to