Re: Patch to change murmurhash implementation slightly

Thomas Dullien Tue, 25 Apr 2023 07:20:25 -0700

Hey,

I am pretty confident about correctness. The change passes both Lucene and
ES regression tests and my careful reading of the code is pretty certain
that the output is the same. If you want me to randomly test the result for
a few hundred million random strings, I'm happy to do that, too, if you
have other suggestions for correctness testing, let me know.


The change does increase the method size and may impact inlining - but so
does literally any code change, particularly in a JIT'ed environment where
placement of code (and hence things like instruction cache conflicts)
depend on the precise history of execution. The way I understand it, one
deals with this by benchmarking and measuring.

FWIW, several indexing-heavy ES benchmarks show a noticeable improvement in
indexing speed - this is why I was asking about a broad range of Lucene
benchmarks; to verify that this is indeed the case for Lucene-only, too.

Let me know what data you'd like to see to decide whether this patch is a
good idea, and if there is consensus among the Lucene committers that those
are reasonable criteria, I'll work on producing that data.

Cheers,
Thomas



On Tue, Apr 25, 2023 at 4:02 PM Robert Muir <rcm...@gmail.com> wrote:

> well there is some cost, as it must add additional checks to see if
> its longer than 8. in your patch, additional loops. it increases the
> method size and may impact inlining and other things. also we can't
> forget about correctness, if the hash function does the wrong thing it
> could slow everything to a crawl.
>
> On Tue, Apr 25, 2023 at 9:56 AM Thomas Dullien
> <thomas.dull...@elastic.co> wrote:
> >
> > Ah, I see what you mean.
> >
> > You are correct -- the change will not speed up a 5-byte word, but it
> *will* speed up all 8+-byte words, at no cost to the shorter words.
> >
> > On Tue, Apr 25, 2023 at 3:20 PM Robert Muir <rcm...@gmail.com> wrote:
> >>
> >> if a word is of length 5, processing 8 bytes at a time isn't going to
> >> speed anything up. there aren't 8 bytes to process.
> >>
> >> On Tue, Apr 25, 2023 at 9:17 AM Thomas Dullien
> >> <thomas.dull...@elastic.co.invalid> wrote:
> >> >
> >> > Is average word length <= 4 realistic though? I mean, even the
> english wiki corpus has ~5, which would require two calls to the lucene
> layer instead of one; e.g. multiple layers of virtual dispatch that are
> unnecessary?
> >> >
> >> > You're not going to pay any cycles for reading 8 bytes instead of 4
> bytes, so the cost of doing so will be the same - while speeding up in
> cases where 4 isn't quite enough?
> >> >
> >> > Cheers,
> >> > Thomas
> >> >
> >> > On Tue, Apr 25, 2023 at 3:07 PM Robert Muir <rcm...@gmail.com> wrote:
> >> >>
> >> >> i think from my perspective it has nothing to do with cpus being
> >> >> 32-bit or 64-bit and more to do with the average length of terms in
> >> >> most languages being smaller than 8. for the languages with longer
> >> >> word length, its usually because of complex morphology that most
> users
> >> >> would stem away. so doing 4 bytes at a time seems optimal IMO.
> >> >> languages from nature don't care about your cpu.
> >> >>
> >> >> On Tue, Apr 25, 2023 at 8:52 AM Michael McCandless
> >> >> <luc...@mikemccandless.com> wrote:
> >> >> >
> >> >> > For a truly "pure" indexing test I usually use a single thread for
> indexing, and SerialMergeScheduler (using that single thread to also do
> single-threaded merging).  It makes the indexing take forever lol but it
> produces "comparable" results.
> >> >> >
> >> >> > But ... this sounds like a great change anyway?  Do we really need
> to gate it on benchmark results?  Do we think there could be a downside
> e.g. slower indexing on (the dwindling) 32 bit CPUs?
> >> >> >
> >> >> > Mike McCandless
> >> >> >
> >> >> > http://blog.mikemccandless.com
> >> >> >
> >> >> >
> >> >> > On Tue, Apr 25, 2023 at 7:39 AM Robert Muir <rcm...@gmail.com>
> wrote:
> >> >> >>
> >> >> >> I think the results of the benchmark will depend on the
> properties of
> >> >> >> the indexed terms. For english wikipedia (luceneutil) the average
> word
> >> >> >> length is around 5 bytes so this optimization may not do much.
> >> >> >>
> >> >> >> On Tue, Apr 25, 2023 at 1:58 AM Patrick Zhai <zhai7...@gmail.com>
> wrote:
> >> >> >> >
> >> >> >> > I did a quick run with your patch, but since I turned on the
> CMS as well as TieredMergePolicy I'm not sure how fair the comparison is.
> Here's the result:
> >> >> >> > Candidate:
> >> >> >> > Indexer: indexing done (890209 msec); total 33332620 docs
> >> >> >> > Indexer: waitForMerges done (71622 msec)
> >> >> >> > Indexer: finished (961877 msec)
> >> >> >> > Baseline:
> >> >> >> > Indexer: indexing done (909706 msec); total 33332620 docs
> >> >> >> > Indexer: waitForMerges done (54775 msec)
> >> >> >> > Indexer: finished (964528 msec)
> >> >> >> >
> >> >> >> > For more accurate comparison I guess it's better to use
> LogxxMergePolicy and turn off CMS? If you want to run it yourself you can
> find the lines I quoted from the log file.
> >> >> >> >
> >> >> >> > Patrick
> >> >> >> >
> >> >> >> > On Mon, Apr 24, 2023 at 12:34 PM Thomas Dullien <
> thomas.dull...@elastic.co.invalid> wrote:
> >> >> >> >>
> >> >> >> >> Hey all,
> >> >> >> >>
> >> >> >> >> I've been experimenting with fixing some low-hanging
> performance fruit in the ElasticSearch codebase, and came across the fact
> that the MurmurHash implementation that is used by ByteRef.hashCode() is
> reading 4 bytes per loop iteration (which is likely an artifact from 32-bit
> architectures, which are ever-less-important). I made a small fix to change
> the implementation to read 8 bytes per loop iteration; I expected a very
> small impact (2-3% CPU or so over an indexing run in ElasticSearch), but
> got a pretty nontrivial throughput improvement over a few indexing
> benchmarks.
> >> >> >> >>
> >> >> >> >> I tried running Lucene-only benchmarks, and succeeded in
> running the example from https://github.com/mikemccand/luceneutil - but I
> couldn't figure out how to run indexing benchmarks and how to interpret the
> results.
> >> >> >> >>
> >> >> >> >> Could someone help me in running the benchmarks for the
> attached patch?
> >> >> >> >>
> >> >> >> >> Cheers,
> >> >> >> >> Thomas
> >> >> >> >>
> >> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >> >>
> >> >> >>
> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
>

Re: Patch to change murmurhash implementation slightly

Reply via email to