Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

David Smiley (@MITRE.org) Tue, 08 Oct 2013 08:53:12 -0700

Hi Karl!

I suggest that you put the point data you need in BinaryDocValues.  That is
both the x & y into the same byte[] chunk.  I've done this for a Solr
integration in https://issues.apache.org/jira/browse/SOLR-5170


~ David


karl.wright-2 wrote
> Hi All (and especially Robert),
> 
> Lucene NumericDocValues seems to operate slower than we would expect.  In
> our application, we're using it for storing coordinate values, which we
> retrieve to compute a distance.  While doing timings trying to determine
> the impact of including a sqrt in the calculation, we noted that the
> lucene overhead itself overwhelmed pretty much anything we did in the
> ValueSource.
> 
> One of our engineers did performance testing (code attached, hope it gets
> through), which shows what we are talking about.   Please see the thread
> below.  The question is: why is lucene 2.5x slower than a direct buffer
> access for this case?  And is there anything we can do in the Lucene
> paradigm to get our performance back closer to the direct buffer case?
> 
> Karl
> 
> -----Original Message-----
> From: Ziech Christian (HERE/Berlin) 
> Sent: Tuesday, October 08, 2013 9:08 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: AW: Is there a really performant way to store a full 32-bit int
> in doc values?
> 
> Hi,
> 
> I have tested now the approach with usind the NumericDocValues directly
> and it indeed helps about 20% compared to the original Lucene numbers -
> Lucene is still 2,5x slower than using a DirectBuffer alone but it helps.
> The funny thing is really that with lucene using the SquareRoot is almost
> meaningless which can be explained well by the CPU calculating the
> SquareRoot while other things are computated and since it doesn't need the
> result for a while in my micro-Benchmark it can happily do other things in
> the meantime. Since we also have a lot of other query aspects we'd get
> that gain either way I assume so calculating about 30-50ms for the square
> root for the scoring 25M documents should be about accurate. So what is
> lucene doing that causes it to be 3 times slower than the naive approach.
> And why is that impact compared to the one of a simple square root
> (slowing down things by ~20% when assuming the 30ms with more complex
> actions) so big? I mean 20% vs 200% is a magnitude!
> As a side note: Storing the values as a int when using a DirectBuffer
> doesn't seem helpful - I assume because we have to cast the in to float
> either way later.
> 
> BR
>   Christian
> 
> PS: The new numbers are:
> Scoring 25000000 documents with direct float buffers (without square root)
> took 190 
> 
> Scoring 25000000 documents with direct float buffers (without square root)
> took 171 
> 
> Scoring 25000000 documents with direct float buffers (without square root)
> took 172 
> 
> Scoring 25000000 documents with direct float buffers (and a square root)
> took 281 
> 
> Scoring 25000000 documents with direct float buffers (and a square root)
> took 280 
> 
> Scoring 25000000 documents with direct float buffers (and a square root)
> took 266 
> 
> Scoring 25000000 documents with a lucene float value source (without
> square root) took 1045 
> 
> Scoring 25000000 documents with a lucene float value source (without
> square root) took 625 
> 
> Scoring 25000000 documents with a lucene float value source (without
> square root) took 630 
> 
> Scoring 25000000 documents with a lucene float value source (and a square
> root) took 661 
> 
> Scoring 25000000 documents with a lucene float value source (and a square
> root) took 670 
> 
> Scoring 25000000 documents with a lucene float value source (and a square
> root) took 665 
> 
> Scoring 25000000 documents with direct int buffers (without square root)
> took 218 
> 
> Scoring 25000000 documents with direct int buffers (without square root)
> took 219 
> 
> Scoring 25000000 documents with direct int buffers (without square root)
> took 204 
> 
> Scoring 25000000 documents with a lucene numeric values (without square
> root) source took 1123 
> 
> Scoring 25000000 documents with a lucene numeric values (without square
> root) source took 500 
> 
> Scoring 25000000 documents with a lucene numeric values (without square
> root) source took 499 
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 531 
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 531 
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 535
> 
> 
> ________________________________________
> Von: Wright Karl (HERE/Cambridge)
> Gesendet: Montag, 7. Oktober 2013 09:22
> An: Ziech Christian (HERE/Berlin)
> Betreff: FW: Is there a really performant way to store a full 32-bit int
> in doc values?
> 
> -----Original Message-----
> From: ext Michael McCandless [mailto:

> lucene@

> ]
> Sent: Monday, October 07, 2013 8:28 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: Re: Is there a really performant way to store a full 32-bit int
> in doc values?
> 
> Well, it is a micro-benchmark ... so it'd be better to test in the
> wider/full context of the application?
> 
> I'm also a little worried that you go through ValueSource instead of
> interacting directly with the NumericDocValues instance; it's just an
> additional level of indirection that may confuse hotspot.  But it really
> ought not be so bad ...
> 
> Under the hood we encode a float to an int using Float.floatToRawIntBits;
> it could be that this doesn't work well w/ the compression we then do on
> the ints by default?  I'm curious which impl the Lucene45DocValuesConsumer
> is using in your case.  Looks like you are using random floats, so I'd
> expect it's using DELTA_COMPRESSED.
> 
> It'd be a simple test to just make your own DVFormat using raw 32 bit
> ints, to see how much that helps.
> 
> But, yes, I would just email the list and see if there are other ideas?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Mon, Oct 7, 2013 at 7:14 AM,  &lt;

> karl.wright@

> &gt; wrote:
>> Hi Mike,
>>
>>
>>
>> Before I post to the general list, do you see any problem with our 
>> testing methodology?
>>
>>
>>
>> Basically, we conclude that by far the most expensive thing is 
>> retrieving the NumericDocValue value.  This currently overwhelms any 
>> expensive operations we might do in the scoring ourselves, which is 
>> why we're looking for potential improvements in that area.
>>
>>
>>
>> Do you agree with the assessment?
>>
>> Karl
>>
>>
>>
>> From: Ziech Christian (HERE/Berlin)
>> Sent: Friday, October 04, 2013 11:09 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: AW: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>>
>>
>> Hi,
>>
>> maybe it's best if I share where I got my numbers from - I have 
>> written a small test (which originally should only test the
>> Math.sqrt() impact for 10M scorings).
>>
>> The output is (I looped over the search invocation to give lucene a 
>> chance to load everything):
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 203
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 179
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 172
>> Scoring 25000000 documents with direct buffers (and a square root) 
>> took 292 Scoring 25000000 documents with direct buffers (and a square
>> root) took 289 Scoring 25000000 documents with direct buffers (and a 
>> square root) took 289 Scoring 25000000 documents with a lucene value 
>> (without square root) source took 1045 Scoring 25000000 documents with 
>> a lucene value (without square root) source took 656 Scoring 25000000 
>> documents with a lucene value (without square root) source took 660 
>> Scoring 25000000 documents with a lucene value (without square root) 
>> source took 658 Scoring 25000000 documents with a lucene value 
>> (without square root) source took 663 Scoring 25000000 documents with 
>> a lucene value (and a square root) source took 711 Scoring 25000000 
>> documents with a lucene value (and a square root) source took 710 
>> Scoring 25000000 documents with a lucene value (and a square root) 
>> source took 713 Scoring 25000000 documents with a lucene value (and a 
>> square root) source took 711 Scoring 25000000 documents with a lucene 
>> value (and a square root) source took 714
>>
>> So the impact of a square root is roughly 110ms while the impact of 
>> using the lucene function values is far higher (depending on the run 
>> between 300-350ms). Interstingly the square root impact is not as high 
>> on the lucene function query for some reason (most likely java or the 
>> cpu can just optimize the very simple scorer best).
>>
>> I did measure the values with a FSDirectory and a RAMDirectory which 
>> both essentially yield the same performance. Do you see any problem 
>> with the attached code?
>>
>> BR
>>   Christian
>>
>> ________________________________
>>
>> Von: Wright Karl (HERE/Cambridge)
>> Gesendet: Freitag, 4. Oktober 2013 20:56
>> An: Ziech Christian (HERE/Berlin)
>> Betreff: FW: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>>
>> FYI
>> Karl
>>
>> Sent from my Windows Phone
>>
>> ________________________________
>>
>> From: ext Michael McCandless
>> Sent: 10/4/2013 4:51 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: Re: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>> Hmmm, that's interesting that you see decode cost is too high.  Are 
>> you sure?
>>
>> Can you email the list?  I'm sure Rob will have suggestions.  The 
>> worst case is you make a custom DV format that stores things raw.
>>
>> 4.5 has a new default DocValuesFormat with more compression, but with 
>> values stored on disk by default (cached by the OS if you have the
>> RAM) ... I wonder how that would compare to what you're using now.
>>
>> I think the simplest thing to do is to instantiate the 
>> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5), 
>> passing a very high acceptableOverheadRatio?  This should caused 
>> packed ints to upgraded to a byte[], short[], int[], long[].  If this 
>> is still not fast enough then I suspect a custom DVFormat that just 
>> uses int[] directly (avoiding the abstractions of packed ints) is your 
>> best shot.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Oct 4, 2013 at 8:46 AM,  &lt;

> karl.wright@

> &gt; wrote:
>>>
>>>
>>> Hi Mike,
>>>
>>>
>>>
>>> We're using docvalues to store geocoordinates in meters in X,Y,Z 
>>> space, and discovering that they are taking more time to unpack than 
>>> we'd like.  I was surprised to find no raw representation available 
>>> for docvalues right now
>>> -
>>> otherwise, a fixed 4-byte representation would have been ideal. Would 
>>> you have any suggestions?
>>>
>>>
>>>
>>> Karl
>>>
>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 

> dev-unsubscribe@.apache

> For additional commands, e-mail: 

> dev-help@.apache

> 
> LuceneFloatSourceTest.java (16K)
> &lt;http://lucene.472066.n3.nabble.com/attachment/4094104/0/LuceneFloatSourceTest.java&gt;





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FW-Is-there-a-really-performant-way-to-store-a-full-32-bit-int-in-doc-values-tp4094104p4094120.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: FW: Is there a really performant way to store a full 32-bit int in doc values?

Reply via email to