Hi Nik, The trade-off is not easy indeed. First, the default terms dictionary can already save some disk seeks. By storing the prefixes of the terms that are in the terms dictionary in a FST in memory, it can avoid going to disk when the term that you are looking up cannot match this FST. A bloom filter might save a few additional disk seeks but as you said, it's pretty intensive memory-wise and sometimes that is memory that would just be better spent on the filesystem cache.
On Thu, Jul 17, 2014 at 4:25 PM, Nikolas Everett <[email protected]> wrote: > Has anyone had success adding a bloom filter to the codec for any of their > fields? > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-codec.html#bloom-postings > > I imagine it'd help reduce IO from (non multi-term) queries that > frequently don't match. Like if you have a field that is very specific and > useful for searching but very rarely matches anything. > > It looks like the cost is in the range of 10 bits of heap per term per > segment for a false positive probability around 1%. Meaning it'd be pretty > high if the index had lots of terms - especially if they were in many > segments. But it'd be about 10 bits per value if the values were mostly > unique. > > Nik > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3X11bwogWi9oFTYFzzO6%2BdnvsOqcEFWG_dB5c%2Boy%3D4Fw%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3X11bwogWi9oFTYFzzO6%2BdnvsOqcEFWG_dB5c%2Boy%3D4Fw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j52TNTaN8NzNpB5jd-Kms3VuVtn_0ZFVqbt%2B7tfhk%3D1WQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
