This is not something I am very familiar with, but this issue https://issues.apache.org/jira/browse/LUCENE-2312 tried to improve NRT latency by adding the ability to search directly into the indexing buffer of the index writer.
Le mar. 12 juil. 2016 à 16:11, Konstantin <[email protected]> a écrit : > Hello everyone, > As far as I understand NRT requires flushing new segment to disk. Is it > correct that write cache is not searchable ? > > Competing search library groonga > <http://groonga.org/docs/characteristic.html> - claim that they have much > smaller realtime search latency (as far as I understand via searchable > write-cache), but loading data into their index takes almost three times > longer (benchmark in blog post in Japanese > <http://blog.createfield.com/entry/2014/07/22/080958> , seems like > wikipedia XML, I'm not sure if it's English one ). > > I've created incomplete prototype of searchable write cache in my pet > project <https://github.com/kk00ss/Rhinodog> - and it takes two times > longer to index fraction of wikipedia using same EnglishAnalyzer from > lucene.analysis (probably there is a room for optimizations). While loading > data into Lucene I didn't reuse Document instances. Searchable write-cache > was implemented as a bunch of persistent scala's SortedMap[TermKey, > Measure](), one per logical core. Where TermKey is defined as > TermKey(termID:Int, > docID: Long)and Measure is just frequency and norm (but could be > extended). > > Do you think it's worth the slowdown ? If so I'm interested to learn how > this part of Lucene works while implementing this feature. However it is > unclear to me how hard would it be to change existing implementation. I > cannot wrap my head around TermHash and the whole flush process - are there > any documentation, good blog posts to read about it ? > >
