Thank you! Opened https://issues.apache.org/jira/browse/LUCENE-9378 to
address this.

Viral Gandhi

On Wed, 20 May 2020 at 15:27, Michael McCandless <[email protected]>
wrote:

> I think we could do this at the Codec level?
>
> For example, for stored fields, the current default format
> (Lucene50StoredFieldsFormat) has two modes, Mode.BEST_SPEED and
> Mode.BEST_COMPRESSION, that are easy for the user to pick.  Both modes use
> compression, just at varying levels.
>
> I think for the (new) Lucene84DocValuesFormat, which looks like it will
> always compress binary DVs, we could similarly add a Mode, maybe with two
> options, COMPRESSED and UNCOMPRESSED?
>
> This way it is fairly simple for users to create a custom Codec
> subclassing the default Codec and pick the format they want.  And we can
> try to figure out which way it should default.  Our (Amazon's customer
> facing product search) usage is admittedly unusual, heavily relying on
> BINARY doc values performance per hit collected during matching.  Other
> search applications might not see a 40% hit to their red-line throughput :)
>
> Viral could you please open a Jira issue to find a way to make this
> configurable?  We can hash out the details on the issue ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, May 20, 2020 at 5:38 PM Michael Sokolov <[email protected]>
> wrote:
>
>> I guess the compression we added to binary doc values, and for
>> postings, seems to have hurt performance in a way that wasn't detected
>> in testing when those changes were made, or if it was detected, I
>> don't recall any discussion about the tradeoff being made. Now that we
>> do see there is a tradeoff, I think we need to have that discussion
>> though. I can see that having compression can be a nice win for
>> indexes that are huge and may be memory bound, since it can help avoid
>> I/O, but for a low-latency case where the index is already memory
>> resident, we are willing to pay the price of a larger index to avoid
>> the cost of decompression. I think we need to find some way of
>> handling both cases. I think our design principle should be to expose
>> as few knobs as we can, but in this case I don't see how the code can
>> make the decision whether to compress or not, since it really depends
>> on external design considerations (how big will the index grow? how
>> much RAM will the servers have? what query latency is tolerable?)
>> Given that, I think we should find a way to expose some kind of
>> configurability. Maybe as a first step, rather than making this
>> configurable for each DocValuesType, we could offer a global
>> configuration in IndexWriterConfig (compressFields=true/false)?
>>
>> On Tue, May 19, 2020 at 1:05 AM David Smiley <[email protected]>
>> wrote:
>> >
>> > I don't have a direct answer for you, but your message causes me to
>> reflect on how Lucene does *not* give users choice of format on a per-type
>> basis (e.g. BinaryDocValues vs NumericDocValues vs etc.), which is
>> annoying.  Ideally the previous simple format would be available for you to
>> choose, but it is not.  Lucene lets you mix & match PostingsFormats, stored
>> fields formats, term vectors formats, points format.  But when it comes to
>> DocValues, it's an all-encompassing format for five different structures.
>> So you take it or leave it; all or nothing.  My colleague filed
>> https://issues.apache.org/jira/browse/LUCENE-9236 on this matter; feel
>> free to comment there with your opinion if you have one.
>> >
>> > ~ David
>> >
>> >
>> > On Mon, May 18, 2020 at 7:52 PM Viral Gandhi <[email protected]>
>> wrote:
>> >>
>> >> Hi,
>> >> I tried upgrading to lucene 8.5.1 from 8.4 and ran our internal
>> benchmarking. We noticed that with this upgrade our QPS dropped more than
>> 40% and also affected latencies. After doing some profiling and reverting
>> LUCENE-9211 commit related to BinaryDocValues compression, we recovered
>> ~30% of the loss. Did anyone encounter similar situation?
>> >>
>> >> We rely on BinaryDocValues very heavily. Should this newly introduced
>> compression be optional to opt-in?
>> >>
>> >> Also, any other pointers for on recovering remaining 10% loss. When I
>> run benchmark on 8.4 index with 8.5.1 code, performance is very similar to
>> 8.4.
>> >>
>> >> Thanks,
>> >> Viral Gandhi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to