I guess the compression we added to binary doc values, and for postings, seems to have hurt performance in a way that wasn't detected in testing when those changes were made, or if it was detected, I don't recall any discussion about the tradeoff being made. Now that we do see there is a tradeoff, I think we need to have that discussion though. I can see that having compression can be a nice win for indexes that are huge and may be memory bound, since it can help avoid I/O, but for a low-latency case where the index is already memory resident, we are willing to pay the price of a larger index to avoid the cost of decompression. I think we need to find some way of handling both cases. I think our design principle should be to expose as few knobs as we can, but in this case I don't see how the code can make the decision whether to compress or not, since it really depends on external design considerations (how big will the index grow? how much RAM will the servers have? what query latency is tolerable?) Given that, I think we should find a way to expose some kind of configurability. Maybe as a first step, rather than making this configurable for each DocValuesType, we could offer a global configuration in IndexWriterConfig (compressFields=true/false)?
On Tue, May 19, 2020 at 1:05 AM David Smiley <[email protected]> wrote: > > I don't have a direct answer for you, but your message causes me to reflect > on how Lucene does *not* give users choice of format on a per-type basis > (e.g. BinaryDocValues vs NumericDocValues vs etc.), which is annoying. > Ideally the previous simple format would be available for you to choose, but > it is not. Lucene lets you mix & match PostingsFormats, stored fields > formats, term vectors formats, points format. But when it comes to > DocValues, it's an all-encompassing format for five different structures. So > you take it or leave it; all or nothing. My colleague filed > https://issues.apache.org/jira/browse/LUCENE-9236 on this matter; feel free > to comment there with your opinion if you have one. > > ~ David > > > On Mon, May 18, 2020 at 7:52 PM Viral Gandhi <[email protected]> wrote: >> >> Hi, >> I tried upgrading to lucene 8.5.1 from 8.4 and ran our internal >> benchmarking. We noticed that with this upgrade our QPS dropped more than >> 40% and also affected latencies. After doing some profiling and reverting >> LUCENE-9211 commit related to BinaryDocValues compression, we recovered ~30% >> of the loss. Did anyone encounter similar situation? >> >> We rely on BinaryDocValues very heavily. Should this newly introduced >> compression be optional to opt-in? >> >> Also, any other pointers for on recovering remaining 10% loss. When I run >> benchmark on 8.4 index with 8.5.1 code, performance is very similar to 8.4. >> >> Thanks, >> Viral Gandhi --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
