Switching to "FST50" ought to bring back much of the benefit of "Memory".

On Thu, Aug 23, 2018 at 5:15 PM Adrien Grand <[email protected]> wrote:

> The commit that caused this slowdown might be
> https://github.com/mikemccand/luceneutil/commit/1d8460f342f269c98047def9f9eb76213acae5d9
> .
>
> We don't have anything that performs as well anymore indeed, but I'm not
> sure this is a big deal. I would suspect that there were not many users of
> that postings format, one reason being that it was not supported in terms
> of backward compatibility (like any codec but the default one) and another
> reason being that it used a lot of RAM. In a number of cases, we try to
> fold benefits of alternative codecs in the default codec, for instance we
> used to have a "pulsing" postings format that could record postings in the
> terms dictionary in order to save one disk seek, and we ended up folding
> this feature into the default postings format by only enabling it on terms
> that have a document frequency of 1 and index_options=DOCS_ONLY, so that it
> would be always used with primary keys. For that postings format, it didn't
> really make sense as the way that it managed to be so much faster was by
> loading much more information in RAM, which we don't want to do with the
> default codec.
>
> Le jeu. 23 août 2018 à 22:40, Michael Sokolov <[email protected]> a
> écrit :
>
>> I happened to stumble across this chart
>> https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a
>> pretty drastic drop in this benchmark on 5/13. I looked at the commits
>> between the previous run and this one and did some investigation, trying to
>> do some git bisect to find the problem using benchmarks as a test, but it
>> proved to be quite difficult due to a breaking change re: MemoryCodec that
>> also required corresponding changes in  benchmark code.
>>
>> In the end, I think removing MemoryCodec is what caused the drop in perf
>> here, based on this comment in benchmark code:
>>
>> '2011-06-26'
>>    Switched to MemoryCodec for the primary-key 'id' field so that lookups
>> (either for PKLookup test or for deletions during reopen in the NRT test)
>> are fast, with no IO.  Also switched to NRTCachingDirectory for the NRT
>> test, so that small new segments are written only in RAM.
>>
>> I don't really understand the implications here beyond benchmarks, but it
>> does seem that perhaps some essential high-performing capability has been
>> lost?  Is there some equivalent thing remaining after MemoryCodec's removal
>> that can be used for primary keys?
>>
>> -Mike
>>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to