Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

Robert Muir Tue, 08 Oct 2013 07:41:38 -0700

I dont understand the question, we tell you that its inefficient to do
this (see javadocs of floatdocvaluesfield), and you are doing it
anyway and then surprised that its not as fast as you would like? :)


Did you try mike's suggestion yet and turn off compression?

On Tue, Oct 8, 2013 at 9:31 AM,  <[email protected]> wrote:
> Hi All (and especially Robert),
>
> Lucene NumericDocValues seems to operate slower than we would expect.  In our 
> application, we're using it for storing coordinate values, which we retrieve 
> to compute a distance.  While doing timings trying to determine the impact of 
> including a sqrt in the calculation, we noted that the lucene overhead itself 
> overwhelmed pretty much anything we did in the ValueSource.
>
> One of our engineers did performance testing (code attached, hope it gets 
> through), which shows what we are talking about.   Please see the thread 
> below.  The question is: why is lucene 2.5x slower than a direct buffer 
> access for this case?  And is there anything we can do in the Lucene paradigm 
> to get our performance back closer to the direct buffer case?
>
> Karl
>
> -----Original Message-----
> From: Ziech Christian (HERE/Berlin)
> Sent: Tuesday, October 08, 2013 9:08 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: AW: Is there a really performant way to store a full 32-bit int in 
> doc values?
>
> Hi,
>
> I have tested now the approach with usind the NumericDocValues directly and 
> it indeed helps about 20% compared to the original Lucene numbers - Lucene is 
> still 2,5x slower than using a DirectBuffer alone but it helps. The funny 
> thing is really that with lucene using the SquareRoot is almost meaningless 
> which can be explained well by the CPU calculating the SquareRoot while other 
> things are computated and since it doesn't need the result for a while in my 
> micro-Benchmark it can happily do other things in the meantime. Since we also 
> have a lot of other query aspects we'd get that gain either way I assume so 
> calculating about 30-50ms for the square root for the scoring 25M documents 
> should be about accurate. So what is lucene doing that causes it to be 3 
> times slower than the naive approach. And why is that impact compared to the 
> one of a simple square root (slowing down things by ~20% when assuming the 
> 30ms with more complex actions) so big? I mean 20% vs 200% is a magnitude!
> As a side note: Storing the values as a int when using a DirectBuffer doesn't 
> seem helpful - I assume because we have to cast the in to float either way 
> later.
>
> BR
>   Christian
>
> PS: The new numbers are:
> Scoring 25000000 documents with direct float buffers (without square root) 
> took 190
>
> Scoring 25000000 documents with direct float buffers (without square root) 
> took 171
>
> Scoring 25000000 documents with direct float buffers (without square root) 
> took 172
>
> Scoring 25000000 documents with direct float buffers (and a square root) took 
> 281
>
> Scoring 25000000 documents with direct float buffers (and a square root) took 
> 280
>
> Scoring 25000000 documents with direct float buffers (and a square root) took 
> 266
>
> Scoring 25000000 documents with a lucene float value source (without square 
> root) took 1045
>
> Scoring 25000000 documents with a lucene float value source (without square 
> root) took 625
>
> Scoring 25000000 documents with a lucene float value source (without square 
> root) took 630
>
> Scoring 25000000 documents with a lucene float value source (and a square 
> root) took 661
>
> Scoring 25000000 documents with a lucene float value source (and a square 
> root) took 670
>
> Scoring 25000000 documents with a lucene float value source (and a square 
> root) took 665
>
> Scoring 25000000 documents with direct int buffers (without square root) took 
> 218
>
> Scoring 25000000 documents with direct int buffers (without square root) took 
> 219
>
> Scoring 25000000 documents with direct int buffers (without square root) took 
> 204
>
> Scoring 25000000 documents with a lucene numeric values (without square root) 
> source took 1123
>
> Scoring 25000000 documents with a lucene numeric values (without square root) 
> source took 500
>
> Scoring 25000000 documents with a lucene numeric values (without square root) 
> source took 499
>
> Scoring 25000000 documents with a lucene numeric values (and a square root) 
> source took 531
>
> Scoring 25000000 documents with a lucene numeric values (and a square root) 
> source took 531
>
> Scoring 25000000 documents with a lucene numeric values (and a square root) 
> source took 535
>
>
> ________________________________________
> Von: Wright Karl (HERE/Cambridge)
> Gesendet: Montag, 7. Oktober 2013 09:22
> An: Ziech Christian (HERE/Berlin)
> Betreff: FW: Is there a really performant way to store a full 32-bit int in 
> doc values?
>
> -----Original Message-----
> From: ext Michael McCandless [mailto:[email protected]]
> Sent: Monday, October 07, 2013 8:28 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: Re: Is there a really performant way to store a full 32-bit int in 
> doc values?
>
> Well, it is a micro-benchmark ... so it'd be better to test in the wider/full 
> context of the application?
>
> I'm also a little worried that you go through ValueSource instead of 
> interacting directly with the NumericDocValues instance; it's just an 
> additional level of indirection that may confuse hotspot.  But it really 
> ought not be so bad ...
>
> Under the hood we encode a float to an int using Float.floatToRawIntBits; it 
> could be that this doesn't work well w/ the compression we then do on the 
> ints by default?  I'm curious which impl the Lucene45DocValuesConsumer is 
> using in your case.  Looks like you are using random floats, so I'd expect 
> it's using DELTA_COMPRESSED.
>
> It'd be a simple test to just make your own DVFormat using raw 32 bit ints, 
> to see how much that helps.
>
> But, yes, I would just email the list and see if there are other ideas?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Oct 7, 2013 at 7:14 AM,  <[email protected]> wrote:
>> Hi Mike,
>>
>>
>>
>> Before I post to the general list, do you see any problem with our
>> testing methodology?
>>
>>
>>
>> Basically, we conclude that by far the most expensive thing is
>> retrieving the NumericDocValue value.  This currently overwhelms any
>> expensive operations we might do in the scoring ourselves, which is
>> why we're looking for potential improvements in that area.
>>
>>
>>
>> Do you agree with the assessment?
>>
>> Karl
>>
>>
>>
>> From: Ziech Christian (HERE/Berlin)
>> Sent: Friday, October 04, 2013 11:09 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: AW: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>>
>>
>> Hi,
>>
>> maybe it's best if I share where I got my numbers from - I have
>> written a small test (which originally should only test the
>> Math.sqrt() impact for 10M scorings).
>>
>> The output is (I looped over the search invocation to give lucene a
>> chance to load everything):
>> Scoring 25000000 documents with direct buffers (without square root)
>> took
>> 203
>> Scoring 25000000 documents with direct buffers (without square root)
>> took
>> 179
>> Scoring 25000000 documents with direct buffers (without square root)
>> took
>> 172
>> Scoring 25000000 documents with direct buffers (and a square root)
>> took 292 Scoring 25000000 documents with direct buffers (and a square
>> root) took 289 Scoring 25000000 documents with direct buffers (and a
>> square root) took 289 Scoring 25000000 documents with a lucene value
>> (without square root) source took 1045 Scoring 25000000 documents with
>> a lucene value (without square root) source took 656 Scoring 25000000
>> documents with a lucene value (without square root) source took 660
>> Scoring 25000000 documents with a lucene value (without square root)
>> source took 658 Scoring 25000000 documents with a lucene value
>> (without square root) source took 663 Scoring 25000000 documents with
>> a lucene value (and a square root) source took 711 Scoring 25000000
>> documents with a lucene value (and a square root) source took 710
>> Scoring 25000000 documents with a lucene value (and a square root)
>> source took 713 Scoring 25000000 documents with a lucene value (and a
>> square root) source took 711 Scoring 25000000 documents with a lucene
>> value (and a square root) source took 714
>>
>> So the impact of a square root is roughly 110ms while the impact of
>> using the lucene function values is far higher (depending on the run
>> between 300-350ms). Interstingly the square root impact is not as high
>> on the lucene function query for some reason (most likely java or the
>> cpu can just optimize the very simple scorer best).
>>
>> I did measure the values with a FSDirectory and a RAMDirectory which
>> both essentially yield the same performance. Do you see any problem
>> with the attached code?
>>
>> BR
>>   Christian
>>
>> ________________________________
>>
>> Von: Wright Karl (HERE/Cambridge)
>> Gesendet: Freitag, 4. Oktober 2013 20:56
>> An: Ziech Christian (HERE/Berlin)
>> Betreff: FW: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>>
>> FYI
>> Karl
>>
>> Sent from my Windows Phone
>>
>> ________________________________
>>
>> From: ext Michael McCandless
>> Sent: 10/4/2013 4:51 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: Re: Is there a really performant way to store a full 32-bit
>> int in doc values?
>>
>> Hmmm, that's interesting that you see decode cost is too high.  Are
>> you sure?
>>
>> Can you email the list?  I'm sure Rob will have suggestions.  The
>> worst case is you make a custom DV format that stores things raw.
>>
>> 4.5 has a new default DocValuesFormat with more compression, but with
>> values stored on disk by default (cached by the OS if you have the
>> RAM) ... I wonder how that would compare to what you're using now.
>>
>> I think the simplest thing to do is to instantiate the
>> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5),
>> passing a very high acceptableOverheadRatio?  This should caused
>> packed ints to upgraded to a byte[], short[], int[], long[].  If this
>> is still not fast enough then I suspect a custom DVFormat that just
>> uses int[] directly (avoiding the abstractions of packed ints) is your
>> best shot.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Oct 4, 2013 at 8:46 AM,  <[email protected]> wrote:
>>>
>>>
>>> Hi Mike,
>>>
>>>
>>>
>>> We're using docvalues to store geocoordinates in meters in X,Y,Z
>>> space, and discovering that they are taking more time to unpack than
>>> we'd like.  I was surprised to find no raw representation available
>>> for docvalues right now
>>> -
>>> otherwise, a fixed 4-byte representation would have been ideal. Would
>>> you have any suggestions?
>>>
>>>
>>>
>>> Karl
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

Reply via email to