RE: FW: Is there a really performant way to store a full 32-bit int in doc values?

karl.wright Tue, 08 Oct 2013 10:37:07 -0700

Hi David,

We tried that and still didn't come close to DirectBuffer speed.  It was only 
about 20% faster.  I've attached updated numbers.


We looked through the Lucene code and determined that very likely the costly 
part is loading each part of an int out of the byte array.  There are much 
faster (in fact, native) operations available for reading a whole word or float 
at one time, if we could get access to the DirectBuffer behind the DocValues 
implementation.  But when Lucene loads the byte array into Java heap memory 
that ability is lost.

Karl


-----Original Message-----
From: ext David Smiley (@MITRE.org) [mailto:[email protected]] 
Sent: Tuesday, October 08, 2013 11:52 AM
To: [email protected]
Subject: Re: FW: Is there a really performant way to store a full 32-bit int in 
doc values?

Hi Karl!

I suggest that you put the point data you need in BinaryDocValues.  That is 
both the x & y into the same byte[] chunk.  I've done this for a Solr 
integration in https://issues.apache.org/jira/browse/SOLR-5170

~ David


karl.wright-2 wrote
> Hi All (and especially Robert),
> 
> Lucene NumericDocValues seems to operate slower than we would expect.  
> In our application, we're using it for storing coordinate values, 
> which we retrieve to compute a distance.  While doing timings trying 
> to determine the impact of including a sqrt in the calculation, we 
> noted that the lucene overhead itself overwhelmed pretty much anything 
> we did in the ValueSource.
> 
> One of our engineers did performance testing (code attached, hope it gets
> through), which shows what we are talking about.   Please see the thread
> below.  The question is: why is lucene 2.5x slower than a direct 
> buffer access for this case?  And is there anything we can do in the 
> Lucene paradigm to get our performance back closer to the direct buffer case?
> 
> Karl
> 
> -----Original Message-----
> From: Ziech Christian (HERE/Berlin)
> Sent: Tuesday, October 08, 2013 9:08 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: AW: Is there a really performant way to store a full 32-bit 
> int in doc values?
> 
> Hi,
> 
> I have tested now the approach with usind the NumericDocValues 
> directly and it indeed helps about 20% compared to the original Lucene 
> numbers - Lucene is still 2,5x slower than using a DirectBuffer alone but it 
> helps.
> The funny thing is really that with lucene using the SquareRoot is 
> almost meaningless which can be explained well by the CPU calculating 
> the SquareRoot while other things are computated and since it doesn't 
> need the result for a while in my micro-Benchmark it can happily do 
> other things in the meantime. Since we also have a lot of other query 
> aspects we'd get that gain either way I assume so calculating about 
> 30-50ms for the square root for the scoring 25M documents should be 
> about accurate. So what is lucene doing that causes it to be 3 times slower 
> than the naive approach.
> And why is that impact compared to the one of a simple square root 
> (slowing down things by ~20% when assuming the 30ms with more complex
> actions) so big? I mean 20% vs 200% is a magnitude!
> As a side note: Storing the values as a int when using a DirectBuffer 
> doesn't seem helpful - I assume because we have to cast the in to 
> float either way later.
> 
> BR
>   Christian
> 
> PS: The new numbers are:
> Scoring 25000000 documents with direct float buffers (without square 
> root) took 190
> 
> Scoring 25000000 documents with direct float buffers (without square 
> root) took 171
> 
> Scoring 25000000 documents with direct float buffers (without square 
> root) took 172
> 
> Scoring 25000000 documents with direct float buffers (and a square 
> root) took 281
> 
> Scoring 25000000 documents with direct float buffers (and a square 
> root) took 280
> 
> Scoring 25000000 documents with direct float buffers (and a square 
> root) took 266
> 
> Scoring 25000000 documents with a lucene float value source (without 
> square root) took 1045
> 
> Scoring 25000000 documents with a lucene float value source (without 
> square root) took 625
> 
> Scoring 25000000 documents with a lucene float value source (without 
> square root) took 630
> 
> Scoring 25000000 documents with a lucene float value source (and a 
> square
> root) took 661
> 
> Scoring 25000000 documents with a lucene float value source (and a 
> square
> root) took 670
> 
> Scoring 25000000 documents with a lucene float value source (and a 
> square
> root) took 665
> 
> Scoring 25000000 documents with direct int buffers (without square 
> root) took 218
> 
> Scoring 25000000 documents with direct int buffers (without square 
> root) took 219
> 
> Scoring 25000000 documents with direct int buffers (without square 
> root) took 204
> 
> Scoring 25000000 documents with a lucene numeric values (without 
> square
> root) source took 1123
> 
> Scoring 25000000 documents with a lucene numeric values (without 
> square
> root) source took 500
> 
> Scoring 25000000 documents with a lucene numeric values (without 
> square
> root) source took 499
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 531
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 531
> 
> Scoring 25000000 documents with a lucene numeric values (and a square
> root) source took 535
> 
> 
> ________________________________________
> Von: Wright Karl (HERE/Cambridge)
> Gesendet: Montag, 7. Oktober 2013 09:22
> An: Ziech Christian (HERE/Berlin)
> Betreff: FW: Is there a really performant way to store a full 32-bit 
> int in doc values?
> 
> -----Original Message-----
> From: ext Michael McCandless [mailto:

> lucene@

> ]
> Sent: Monday, October 07, 2013 8:28 AM
> To: Wright Karl (HERE/Cambridge)
> Subject: Re: Is there a really performant way to store a full 32-bit 
> int in doc values?
> 
> Well, it is a micro-benchmark ... so it'd be better to test in the 
> wider/full context of the application?
> 
> I'm also a little worried that you go through ValueSource instead of 
> interacting directly with the NumericDocValues instance; it's just an 
> additional level of indirection that may confuse hotspot.  But it 
> really ought not be so bad ...
> 
> Under the hood we encode a float to an int using 
> Float.floatToRawIntBits; it could be that this doesn't work well w/ 
> the compression we then do on the ints by default?  I'm curious which 
> impl the Lucene45DocValuesConsumer is using in your case.  Looks like 
> you are using random floats, so I'd expect it's using DELTA_COMPRESSED.
> 
> It'd be a simple test to just make your own DVFormat using raw 32 bit 
> ints, to see how much that helps.
> 
> But, yes, I would just email the list and see if there are other ideas?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Mon, Oct 7, 2013 at 7:14 AM,  &lt;

> karl.wright@

> &gt; wrote:
>> Hi Mike,
>>
>>
>>
>> Before I post to the general list, do you see any problem with our 
>> testing methodology?
>>
>>
>>
>> Basically, we conclude that by far the most expensive thing is 
>> retrieving the NumericDocValue value.  This currently overwhelms any 
>> expensive operations we might do in the scoring ourselves, which is 
>> why we're looking for potential improvements in that area.
>>
>>
>>
>> Do you agree with the assessment?
>>
>> Karl
>>
>>
>>
>> From: Ziech Christian (HERE/Berlin)
>> Sent: Friday, October 04, 2013 11:09 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: AW: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>>
>>
>> Hi,
>>
>> maybe it's best if I share where I got my numbers from - I have 
>> written a small test (which originally should only test the
>> Math.sqrt() impact for 10M scorings).
>>
>> The output is (I looped over the search invocation to give lucene a 
>> chance to load everything):
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 203
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 179
>> Scoring 25000000 documents with direct buffers (without square root) 
>> took
>> 172
>> Scoring 25000000 documents with direct buffers (and a square root) 
>> took 292 Scoring 25000000 documents with direct buffers (and a square
>> root) took 289 Scoring 25000000 documents with direct buffers (and a 
>> square root) took 289 Scoring 25000000 documents with a lucene value 
>> (without square root) source took 1045 Scoring 25000000 documents 
>> with a lucene value (without square root) source took 656 Scoring 
>> 25000000 documents with a lucene value (without square root) source 
>> took 660 Scoring 25000000 documents with a lucene value (without 
>> square root) source took 658 Scoring 25000000 documents with a lucene 
>> value (without square root) source took 663 Scoring 25000000 
>> documents with a lucene value (and a square root) source took 711 
>> Scoring 25000000 documents with a lucene value (and a square root) 
>> source took 710 Scoring 25000000 documents with a lucene value (and a 
>> square root) source took 713 Scoring 25000000 documents with a lucene 
>> value (and a square root) source took 711 Scoring 25000000 documents 
>> with a lucene value (and a square root) source took 714
>>
>> So the impact of a square root is roughly 110ms while the impact of 
>> using the lucene function values is far higher (depending on the run 
>> between 300-350ms). Interstingly the square root impact is not as 
>> high on the lucene function query for some reason (most likely java 
>> or the cpu can just optimize the very simple scorer best).
>>
>> I did measure the values with a FSDirectory and a RAMDirectory which 
>> both essentially yield the same performance. Do you see any problem 
>> with the attached code?
>>
>> BR
>>   Christian
>>
>> ________________________________
>>
>> Von: Wright Karl (HERE/Cambridge)
>> Gesendet: Freitag, 4. Oktober 2013 20:56
>> An: Ziech Christian (HERE/Berlin)
>> Betreff: FW: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>>
>> FYI
>> Karl
>>
>> Sent from my Windows Phone
>>
>> ________________________________
>>
>> From: ext Michael McCandless
>> Sent: 10/4/2013 4:51 PM
>> To: Wright Karl (HERE/Cambridge)
>> Subject: Re: Is there a really performant way to store a full 32-bit 
>> int in doc values?
>>
>> Hmmm, that's interesting that you see decode cost is too high.  Are 
>> you sure?
>>
>> Can you email the list?  I'm sure Rob will have suggestions.  The 
>> worst case is you make a custom DV format that stores things raw.
>>
>> 4.5 has a new default DocValuesFormat with more compression, but with 
>> values stored on disk by default (cached by the OS if you have the
>> RAM) ... I wonder how that would compare to what you're using now.
>>
>> I think the simplest thing to do is to instantiate the 
>> Lucene42DocValuesConsumer (renamed to MemoryDVConsumer in 4.5), 
>> passing a very high acceptableOverheadRatio?  This should caused 
>> packed ints to upgraded to a byte[], short[], int[], long[].  If this 
>> is still not fast enough then I suspect a custom DVFormat that just 
>> uses int[] directly (avoiding the abstractions of packed ints) is 
>> your best shot.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Oct 4, 2013 at 8:46 AM,  &lt;

> karl.wright@

> &gt; wrote:
>>>
>>>
>>> Hi Mike,
>>>
>>>
>>>
>>> We're using docvalues to store geocoordinates in meters in X,Y,Z 
>>> space, and discovering that they are taking more time to unpack than 
>>> we'd like.  I was surprised to find no raw representation available 
>>> for docvalues right now
>>> -
>>> otherwise, a fixed 4-byte representation would have been ideal. 
>>> Would you have any suggestions?
>>>
>>>
>>>
>>> Karl
>>>
>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 

> [email protected]

> For additional commands, e-mail: 

> [email protected]

> 
> LuceneFloatSourceTest.java (16K)
> &lt;http://lucene.472066.n3.nabble.com/attachment/4094104/0/LuceneFloa
> tSourceTest.java&gt;





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/FW-Is-there-a-really-performant-way-to-store-a-full-32-bit-int-in-doc-values-tp4094104p4094120.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]

LuceneFloatSourceTest.java
Description: LuceneFloatSourceTest.java

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: FW: Is there a really performant way to store a full 32-bit int in doc values?

Reply via email to