[
https://issues.apache.org/jira/browse/LUCENE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-5731:
--------------------------------
Attachment: LUCENE-5731.patch
Attached is a patch:
* added new DirectWriter, DirectReader. They support > 2B values and don't have
concepts like 'acceptableOverhead', instead its just simple and ensures every
bpv is fast.
* added RandomAccessInput api (default -> seek+read), with optimized impl for
mmap.
* Added 3 byte padding to the end of every DirectWriter stream, all decoding is
one i/o operation.
* DirectReader enforces its use
* Added new Lucene49DocValuesFormat using this stuff.
Across every bitsPerValue i see consistent performance gains, usually 50-75%
from trunk today.
> split direct packed ints from in-ram ones
> -----------------------------------------
>
> Key: LUCENE-5731
> URL: https://issues.apache.org/jira/browse/LUCENE-5731
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-5731.patch
>
>
> Currently there is an oversharing problem in packedints that imposes too many
> requirements on improving it:
> * every packed ints must be able to be loaded directly, or in ram, or
> iterated with.
> * things like filepointers are expected to be adjusted (this is especially
> stupid) in all cases
> * lots of unnecessary abstractions
> * versioning etc is complex
> None of this flexibility is needed or buys us anything, and it prevents
> performance improvements (e.g. i just want to add 3 bytes at the end of
> on-disk streams to reduce the number of bytebuffer calls and thats seriously
> impossible with the current situation).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]