[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2816:
--------------------------------

    Attachment: LUCENE-2816.patch

Here's the most important benchmark: speeding up the MultiMMap's readByte(s) in 
general:

MultiMMapIndexInput readByte(s) improvements [trunk, Standard codec]
||Query||QPS trunk||QPS patch||Pct diff||||
|spanFirst(unit, 5)|12.72|12.85|{color:green}1.0%{color}|
|+nebraska +state|137.47|139.33|{color:green}1.3%{color}|
|spanNear([unit, state], 10, true)|2.90|2.94|{color:green}1.4%{color}|
|"unit state"|5.88|5.99|{color:green}1.8%{color}|
|unit~2.0|7.06|7.20|{color:green}2.0%{color}|
|+unit +state|8.68|8.87|{color:green}2.2%{color}|
|unit state|8.00|8.23|{color:green}2.9%{color}|
|unit~1.0|7.19|7.41|{color:green}3.0%{color}|
|unit*|22.66|23.41|{color:green}3.3%{color}|
|uni*|12.54|13.12|{color:green}4.6%{color}|
|united~1.0|10.61|11.12|{color:green}4.8%{color}|
|united~2.0|2.52|2.65|{color:green}5.1%{color}|
|state|28.72|30.23|{color:green}5.3%{color}|
|un*d|44.84|48.06|{color:green}7.2%{color}|
|u*d|13.17|14.51|{color:green}10.2%{color}|

In the bulk postings branch, I've been experimenting with various techniques 
for FOR/PFOR 
and one thing i tried was simply decoding with readInt() from the DataInput. So 
I adapted For/PFOR
to just take DataInput and work on it directly, instead of reading into a 
byte[], wrapping it with a ByteBuffer,
and working on an IntBuffer view.

But when I did this, i found that MMap was slow for readInt(), etc. So we 
implement these primitives
with ByteBuffer.readInt(). This isn't very important since lucene doesn't much 
use these, and mostly theoretical 
but I still think things like readInt(), readShort(), readLong() should be 
fast... for example just earlier today 
someone posted an alternative PFOR implementation on LUCENE-1410 that uses 
DataInput.readInt().

MMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
||Query||QPS branch||QPS patch||Pct diff||||
|spanFirst(unit, 5)|12.14|11.99|{color:red}-1.2%{color}|
|united~1.0|11.32|11.33|{color:green}0.1%{color}|
|united~2.0|2.51|2.56|{color:green}2.1%{color}|
|unit~1.0|6.98|7.19|{color:green}3.0%{color}|
|unit~2.0|6.88|7.11|{color:green}3.3%{color}|
|spanNear([unit, state], 10, true)|2.81|2.96|{color:green}5.2%{color}|
|unit state|8.04|8.59|{color:green}6.8%{color}|
|+unit +state|10.97|12.12|{color:green}10.5%{color}|
|unit*|26.67|29.80|{color:green}11.7%{color}|
|"unit state"|5.59|6.27|{color:green}12.3%{color}|
|uni*|15.10|17.51|{color:green}15.9%{color}|
|state|33.20|38.72|{color:green}16.6%{color}|
|+nebraska +state|59.17|71.45|{color:green}20.8%{color}|
|un*d|35.98|47.14|{color:green}31.0%{color}|
|u*d|9.48|12.46|{color:green}31.4%{color}|

Here's the same benchmark of DataInput.readInt() but with the 
MultiMMapIndexInput

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput 
codec]
||Query||QPS branch||QPS patch||Pct diff||||
|united~2.0|2.43|2.54|{color:green}4.3%{color}|
|united~1.0|10.78|11.39|{color:green}5.7%{color}|
|unit~1.0|6.81|7.21|{color:green}5.8%{color}|
|unit~2.0|6.62|7.05|{color:green}6.5%{color}|
|spanNear([unit, state], 10, true)|2.77|2.96|{color:green}6.6%{color}|
|unit state|7.85|8.53|{color:green}8.7%{color}|
|spanFirst(unit, 5)|10.50|11.71|{color:green}11.5%{color}|
|+unit +state|10.26|11.94|{color:green}16.3%{color}|
|"unit state"|5.39|6.31|{color:green}17.0%{color}|
|state|31.95|39.17|{color:green}22.6%{color}|
|unit*|24.39|31.02|{color:green}27.2%{color}|
|+nebraska +state|54.68|71.98|{color:green}31.6%{color}|
|u*d|9.53|12.62|{color:green}32.5%{color}|
|uni*|13.72|18.23|{color:green}32.9%{color}|
|un*d|35.87|48.19|{color:green}34.3%{color}|

Just to be sure, I ran this last one on sparc64 (bigendian) also.

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput 
codec]
||Query||QPS branch||QPS patch||Pct diff||||
|united~2.0|2.23|2.26|{color:green}1.5%{color}|
|unit~2.0|6.37|6.47|{color:green}1.6%{color}|
|united~1.0|11.33|11.59|{color:green}2.3%{color}|
|unit~1.0|9.68|10.05|{color:green}3.7%{color}|
|spanNear([unit, state], 10, true)|15.60|17.54|{color:green}12.5%{color}|
|unit*|127.14|144.08|{color:green}13.3%{color}|
|unit state|44.93|51.30|{color:green}14.2%{color}|
|spanFirst(unit, 5)|58.42|68.37|{color:green}17.0%{color}|
|uni*|56.66|67.53|{color:green}19.2%{color}|
|+nebraska +state|215.62|262.99|{color:green}22.0%{color}|
|+unit +state|63.18|77.86|{color:green}23.2%{color}|
|"unit state"|32.24|40.05|{color:green}24.2%{color}|
|u*d|29.13|36.69|{color:green}26.0%{color}|
|state|145.99|188.33|{color:green}29.0%{color}|
|un*d|65.27|87.20|{color:green}33.6%{color}|

I think some of these benchmarks also show that MultiMMapIndexInput might now be
essentially just as fast as MMapIndexInput... but lets not go there yet and 
keep them separate for now.


> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these 
> operations
> in ByteBuffer (which will do a bounds check always and throw 
> BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
> doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to 
> ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's 
> no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
> get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to