Another example is Michael Busch's work while at Twitter, extending Lucene so you can do real-time searches of the write cache ... here's a paper describing it: http://www.umiacs.umd.edu/~jimmylin/publications/Busch_etal_ICDE2012.pdf
But this was a very heavy modification of Lucene and wasn't ever contributed back. I do think it should be possible (just complex!) to have real-time searching of recently indexed documents, and the sorted terms is really only needed if you must support multi-term queries. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 12, 2016 at 12:29 PM, Adrien Grand <[email protected]> wrote: > This is not something I am very familiar with, but this issue > https://issues.apache.org/jira/browse/LUCENE-2312 tried to improve NRT > latency by adding the ability to search directly into the indexing buffer > of the index writer. > > Le mar. 12 juil. 2016 à 16:11, Konstantin <[email protected]> a > écrit : > >> Hello everyone, >> As far as I understand NRT requires flushing new segment to disk. Is it >> correct that write cache is not searchable ? >> >> Competing search library groonga >> <http://groonga.org/docs/characteristic.html> - claim that they have >> much smaller realtime search latency (as far as I understand via searchable >> write-cache), but loading data into their index takes almost three times >> longer (benchmark in blog post in Japanese >> <http://blog.createfield.com/entry/2014/07/22/080958> , seems like >> wikipedia XML, I'm not sure if it's English one ). >> >> I've created incomplete prototype of searchable write cache in my pet >> project <https://github.com/kk00ss/Rhinodog> - and it takes two times >> longer to index fraction of wikipedia using same EnglishAnalyzer from >> lucene.analysis (probably there is a room for optimizations). While loading >> data into Lucene I didn't reuse Document instances. Searchable write-cache >> was implemented as a bunch of persistent scala's SortedMap[TermKey, >> Measure](), one per logical core. Where TermKey is defined as >> TermKey(termID:Int, >> docID: Long)and Measure is just frequency and norm (but could be >> extended). >> >> Do you think it's worth the slowdown ? If so I'm interested to learn how >> this part of Lucene works while implementing this feature. However it is >> unclear to me how hard would it be to change existing implementation. I >> cannot wrap my head around TermHash and the whole flush process - are there >> any documentation, good blog posts to read about it ? >> >>
