[ 
https://issues.apache.org/jira/browse/LUCENE-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5703:
---------------------------------

    Attachment: LUCENE-5703.patch

Here is a patch that switches BinaryDocValues to the discussed API, as well as 
Sorted(Set)DocValues.lookupOrd for consistency.

 - the default codec as well as memory, direct and disk don't allocate the 
byte[] anymore in BinaryDocValues.get.
 - the default codec takes advantage of the maximum length of binary terms, 
which is exposed in the metadata to never have to resize the BytesRef that 
stores the term.
 - old codecs (lucene40, lucene42) have moved to the new API but still allocate 
the byte[] on the fly
 - fixed grouping and comparators to not assume they own the bytes
 - removed the two tests from BaseDocValuesFormatTestCase that ensured that 
each return value had its own bytes

Tests pass (I ran the whole suite 6 times already) and I'll run benchmarks soon 
to make sure that doesn't introduce a performance regression.

> Don't allocate/copy bytes all the time in binary DV producers
> -------------------------------------------------------------
>
>                 Key: LUCENE-5703
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5703
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5703.patch
>
>
> Our binary doc values producers keep on creating new {{byte[]}} arrays and 
> copying bytes when a value is requested, which likely doesn't help 
> performance. This has been done because of the way fieldcache consumers used 
> the API, but we should try to fix it in 5.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to