Re: Near real time search improvement

Adrien Grand Tue, 12 Jul 2016 09:30:25 -0700

This is not something I am very familiar with, but this issue
https://issues.apache.org/jira/browse/LUCENE-2312 tried to improve NRT
latency by adding the ability to search directly into the indexing buffer
of the index writer.


Le mar. 12 juil. 2016 à 16:11, Konstantin <[email protected]> a
écrit :

> Hello everyone,
> As far as I understand NRT requires flushing new segment to disk. Is it
> correct that write cache is not searchable ?
>
> Competing search library groonga
> <http://groonga.org/docs/characteristic.html> - claim that they have much
> smaller realtime search latency (as far as I understand via searchable
> write-cache), but loading data into their index takes almost three times
> longer (benchmark in blog post in Japanese
> <http://blog.createfield.com/entry/2014/07/22/080958> , seems like
>  wikipedia XML, I'm not sure if it's English one ).
>
> I've created incomplete prototype of searchable write cache in my pet
> project <https://github.com/kk00ss/Rhinodog> - and it takes two times
> longer to index fraction of wikipedia using same EnglishAnalyzer from
> lucene.analysis (probably there is a room for optimizations). While loading
> data into Lucene I didn't reuse Document instances. Searchable write-cache
> was implemented as a bunch of persistent  scala's SortedMap[TermKey,
> Measure](), one per logical core. Where TermKey is defined as 
> TermKey(termID:Int,
> docID: Long)and Measure is just frequency and norm (but could be
> extended).
>
> Do you think it's worth the slowdown ? If so I'm interested to learn how
> this part of Lucene works while implementing this feature. However it is
> unclear to me how hard would it be to change existing implementation. I
> cannot wrap my head around TermHash and the whole flush process - are there
> any documentation, good blog posts to read about it ?
>
>

Re: Near real time search improvement

Reply via email to