Thanks Yonik, 

assuming I am not going to index ID , than only an option 4. remains so far. I 
have no other ideas, and Log* merge policy would mean all 4 Indexing magic went 
to nothing :)

Colud then the following do the job? 
clone DefaultIndexWriterProvider into my codebase (ugly, keep in sync , but 
doable)
make it provide 
EnhancedSolrIndexWriter extends SolrIndexWriter

@Override
commit(...){
   super.commit(Map<String, String> Core.getUserMap());
} 

the same with close(...)  


If yes, Is this feature something solr could use? Map<String, String> 
userParams 
 somewhere in Core that gets committed with whatever it has at commit time. I 
could wrap up a patch by modifying SolrIndexWriter directly then?

Nice thing about it, one could have possibility to keep small map of key value 
pairs in sync with commit points with all goods of TwoPhaseCommit... for "no 
way 
for this to get out of sync" things, like my use case below... I imagine DIH 
could use it as well



---------------------------------------------------------

No longer... the default merge policy can now merge non-contiguous segments.
You can of course still select a Log* merge policy, which never
reorders ids with respect to each other.

-Yonik
http://www.lucidimagination.com



________________________________
From: eks dev <eks...@yahoo.co.uk>
To: dev@lucene.apache.org
Sent: Sat, 6 August, 2011 20:47:09
Subject: IndexReader.maxDoc()  and other


Assuming there are no deletes,  would the following work as a way to load *last 
added document*, surviving optimize as well? 
Order of documentId-s in Lucene survives optimize as far as I remember? 

IndexReader ir...
int maxDoc = ir.maxDoc() -  1;
if(maxDoc>0) //? What is the return value on empty index, 0 or 1?
Document d = ir.getDocument(maxDoc);

Would this correspond to the last committed document (at commit point where 
index reader was opened)

Or last added document, including pending/uncommitted (I am not getting 
IndexReader from the IndexWriter, no nrt yet...)


The problem I am trying to solve are incremental updates (there are no 
deletions). Having unique, numerical uid stored in index that is increasing 
with 
every add, I just need a way to find max(uid) on the last commit to get my 
delta 
from the database.

Above solution was one of the options. 
2.The second would be to iterate TermsEnum for uid field until I hit an end, 
but 
this sounds slow (even if I start skipping around like a monkey)? 

3.Third option would be to index reverse uid  (HUGE_CONSTANT - uid), so it gets 
on top in terms dictionary?  

4. And finally, the last option I am thinking of would be to track max(UID) and 
write it as a user Parameter with  IndexWriter.commit(Map...), so I could read 
it easily (piggy-back on lucene commit is as safe as it gets, better then 
persisting own files...) 

I like the last option, but have no idea how to create beforeCommitListener in 
solr?    


The most robust is 2/3, but maybe slow-ish (there are 100-200Mio documents/UIDs)

Any better ideas? (and no, DIH wall clock timestamp is not good enough)

I am talking about solr/lucene 4 trunk, we decided to take a risk :) 
 
Thanks, 
eks

Reply via email to