On Sat, Jul 9, 2016 at 5:44 PM, Konstantin <[email protected]> wrote:

> Thanks
> I'm aware of current implementation of merging in Lucene on a high level.
> Yes Rhinodog uses B-tree for storing everything, it is a bottleneck on
> writes, but it's almost as fast on reads as direct access to location on
> disk.
>
Slower writes for faster reads is the right tradeoff for a search engine,
in general, IMO.

> (With cold cache, while using SSD reads take less time than decoding
> blocks) But may be there is a way to decouple merging/storing + codes from
> everything else? Just quickly  looking over the sources it actually seems
> like a hard task to me. With yet unclear benefits. I'll compare this
> compaction strategies.
>
You mean like Lucene's Codec abstractions?

> Also, I have a question  about search performance - I'm most likely
> testing  it in a wrong way - do you test performance on real users
> queries?  What kinds of queries are more likely? Those where query word's
> have similar frequencies, or those where word's frequencies differ by
> orders of magnitude?
>
It's not possible to answer this :(

Real user queries and real documents those users were querying is by far
best, but they are not easy to come by.

In the nightly wikipedia benchmark, e.g.
http://home.apache.org/~mikemccand/lucenebench/Phrase.html , I use
synthetically generated queries derived from an index to try to mix up the
relative frequencies of the terms.

Mike McCandless

http://blog.mikemccandless.com

Reply via email to