Re: Patch to change murmurhash implementation slightly

Robert Muir Tue, 25 Apr 2023 07:22:49 -0700

sure, but "if length > 8 return 1" might pass these same tests too,
yet cause a ton of hash collisions.


I just think if you want to optimize for super-long strings, there
should be a unit test.

On Tue, Apr 25, 2023 at 10:20 AM Thomas Dullien
<[email protected]> wrote:
>
> Hey,
>
> I am pretty confident about correctness. The change passes both Lucene and ES 
> regression tests and my careful reading of the code is pretty certain that 
> the output is the same. If you want me to randomly test the result for a few 
> hundred million random strings, I'm happy to do that, too, if you have other 
> suggestions for correctness testing, let me know.
>
> The change does increase the method size and may impact inlining - but so 
> does literally any code change, particularly in a JIT'ed environment where 
> placement of code (and hence things like instruction cache conflicts) depend 
> on the precise history of execution. The way I understand it, one deals with 
> this by benchmarking and measuring.
>
> FWIW, several indexing-heavy ES benchmarks show a noticeable improvement in 
> indexing speed - this is why I was asking about a broad range of Lucene 
> benchmarks; to verify that this is indeed the case for Lucene-only, too.
>
> Let me know what data you'd like to see to decide whether this patch is a 
> good idea, and if there is consensus among the Lucene committers that those 
> are reasonable criteria, I'll work on producing that data.
>
> Cheers,
> Thomas
>
>
>
> On Tue, Apr 25, 2023 at 4:02 PM Robert Muir <[email protected]> wrote:
>>
>> well there is some cost, as it must add additional checks to see if
>> its longer than 8. in your patch, additional loops. it increases the
>> method size and may impact inlining and other things. also we can't
>> forget about correctness, if the hash function does the wrong thing it
>> could slow everything to a crawl.
>>
>> On Tue, Apr 25, 2023 at 9:56 AM Thomas Dullien
>> <[email protected]> wrote:
>> >
>> > Ah, I see what you mean.
>> >
>> > You are correct -- the change will not speed up a 5-byte word, but it 
>> > *will* speed up all 8+-byte words, at no cost to the shorter words.
>> >
>> > On Tue, Apr 25, 2023 at 3:20 PM Robert Muir <[email protected]> wrote:
>> >>
>> >> if a word is of length 5, processing 8 bytes at a time isn't going to
>> >> speed anything up. there aren't 8 bytes to process.
>> >>
>> >> On Tue, Apr 25, 2023 at 9:17 AM Thomas Dullien
>> >> <[email protected]> wrote:
>> >> >
>> >> > Is average word length <= 4 realistic though? I mean, even the english 
>> >> > wiki corpus has ~5, which would require two calls to the lucene layer 
>> >> > instead of one; e.g. multiple layers of virtual dispatch that are 
>> >> > unnecessary?
>> >> >
>> >> > You're not going to pay any cycles for reading 8 bytes instead of 4 
>> >> > bytes, so the cost of doing so will be the same - while speeding up in 
>> >> > cases where 4 isn't quite enough?
>> >> >
>> >> > Cheers,
>> >> > Thomas
>> >> >
>> >> > On Tue, Apr 25, 2023 at 3:07 PM Robert Muir <[email protected]> wrote:
>> >> >>
>> >> >> i think from my perspective it has nothing to do with cpus being
>> >> >> 32-bit or 64-bit and more to do with the average length of terms in
>> >> >> most languages being smaller than 8. for the languages with longer
>> >> >> word length, its usually because of complex morphology that most users
>> >> >> would stem away. so doing 4 bytes at a time seems optimal IMO.
>> >> >> languages from nature don't care about your cpu.
>> >> >>
>> >> >> On Tue, Apr 25, 2023 at 8:52 AM Michael McCandless
>> >> >> <[email protected]> wrote:
>> >> >> >
>> >> >> > For a truly "pure" indexing test I usually use a single thread for 
>> >> >> > indexing, and SerialMergeScheduler (using that single thread to also 
>> >> >> > do single-threaded merging).  It makes the indexing take forever lol 
>> >> >> > but it produces "comparable" results.
>> >> >> >
>> >> >> > But ... this sounds like a great change anyway?  Do we really need 
>> >> >> > to gate it on benchmark results?  Do we think there could be a 
>> >> >> > downside e.g. slower indexing on (the dwindling) 32 bit CPUs?
>> >> >> >
>> >> >> > Mike McCandless
>> >> >> >
>> >> >> > http://blog.mikemccandless.com
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Apr 25, 2023 at 7:39 AM Robert Muir <[email protected]> wrote:
>> >> >> >>
>> >> >> >> I think the results of the benchmark will depend on the properties 
>> >> >> >> of
>> >> >> >> the indexed terms. For english wikipedia (luceneutil) the average 
>> >> >> >> word
>> >> >> >> length is around 5 bytes so this optimization may not do much.
>> >> >> >>
>> >> >> >> On Tue, Apr 25, 2023 at 1:58 AM Patrick Zhai <[email protected]> 
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> > I did a quick run with your patch, but since I turned on the CMS 
>> >> >> >> > as well as TieredMergePolicy I'm not sure how fair the comparison 
>> >> >> >> > is. Here's the result:
>> >> >> >> > Candidate:
>> >> >> >> > Indexer: indexing done (890209 msec); total 33332620 docs
>> >> >> >> > Indexer: waitForMerges done (71622 msec)
>> >> >> >> > Indexer: finished (961877 msec)
>> >> >> >> > Baseline:
>> >> >> >> > Indexer: indexing done (909706 msec); total 33332620 docs
>> >> >> >> > Indexer: waitForMerges done (54775 msec)
>> >> >> >> > Indexer: finished (964528 msec)
>> >> >> >> >
>> >> >> >> > For more accurate comparison I guess it's better to use 
>> >> >> >> > LogxxMergePolicy and turn off CMS? If you want to run it yourself 
>> >> >> >> > you can find the lines I quoted from the log file.
>> >> >> >> >
>> >> >> >> > Patrick
>> >> >> >> >
>> >> >> >> > On Mon, Apr 24, 2023 at 12:34 PM Thomas Dullien 
>> >> >> >> > <[email protected]> wrote:
>> >> >> >> >>
>> >> >> >> >> Hey all,
>> >> >> >> >>
>> >> >> >> >> I've been experimenting with fixing some low-hanging performance 
>> >> >> >> >> fruit in the ElasticSearch codebase, and came across the fact 
>> >> >> >> >> that the MurmurHash implementation that is used by 
>> >> >> >> >> ByteRef.hashCode() is reading 4 bytes per loop iteration (which 
>> >> >> >> >> is likely an artifact from 32-bit architectures, which are 
>> >> >> >> >> ever-less-important). I made a small fix to change the 
>> >> >> >> >> implementation to read 8 bytes per loop iteration; I expected a 
>> >> >> >> >> very small impact (2-3% CPU or so over an indexing run in 
>> >> >> >> >> ElasticSearch), but got a pretty nontrivial throughput 
>> >> >> >> >> improvement over a few indexing benchmarks.
>> >> >> >> >>
>> >> >> >> >> I tried running Lucene-only benchmarks, and succeeded in running 
>> >> >> >> >> the example from https://github.com/mikemccand/luceneutil - but 
>> >> >> >> >> I couldn't figure out how to run indexing benchmarks and how to 
>> >> >> >> >> interpret the results.
>> >> >> >> >>
>> >> >> >> >> Could someone help me in running the benchmarks for the attached 
>> >> >> >> >> patch?
>> >> >> >> >>
>> >> >> >> >> Cheers,
>> >> >> >> >> Thomas
>> >> >> >> >>
>> >> >> >> >> ---------------------------------------------------------------------
>> >> >> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> >> >> For additional commands, e-mail: [email protected]
>> >> >> >>
>> >> >> >> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> >> For additional commands, e-mail: [email protected]
>> >> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> For additional commands, e-mail: [email protected]
>> >> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Patch to change murmurhash implementation slightly

Reply via email to