[jira] [Commented] (LUCENE-7122) BytesRefArray can be more efficient for fixed width values

David Smiley (JIRA) Mon, 21 Mar 2016 17:49:45 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205479#comment-15205479
 ]


David Smiley commented on LUCENE-7122:
--------------------------------------

I'm coming at this out of the usefulness of the data structure/utility for use 
within Lucene (since that's where BytesRef is defined of course); not from an 
efficiency standpoint although I could make such an argument.  For that, look 
no further than your friend Mike who is using it as described in this issue.  
Lets consider MemoryIndex which uses BytesRefArray to hold the payload data for 
each term position.  Use of BytesRefArray for this is pretty straight-forward 
(see for yourself).  If BRA didn't exist... it would be more complicated to 
deal without it given what else is in scope.  The postingsWriter variable is a 
SliceWriter which just has writeInt.  That'd be awkward to write the payload 
with.  Or perhaps as an alternative, a new BytesRefHash could be created 
although there's more overhead in use of that versus a simple BytesRefArray.  
Or might you propose an ArrayList of deep-copied BytesRef?  Ugh; think of all 
the GC and overhead _for each token position_.  Or perhaps you have another 
solution in mind?

bq. I want to see numbers. If you like this class, be prepared to defend it

That's a little aggressive.  Any way, The reverse argument of yours could be 
made.  Defend that removing uses of something doesn't slow other things down.  
_Shrug_.

Hey by the way, I think it would be useful for us to consider modifying our 
Javadoc publishing to prevent publishing {{@lucene.internal}} classes.  I think 
that would help assuage some of your concern at the root of this matter.

> BytesRefArray can be more efficient for fixed width values
> ----------------------------------------------------------
>
>                 Key: LUCENE-7122
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7122
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7122.patch, LUCENE-7122.patch
>
>
> Today {{BytesRefArray}} uses one int ({{int[]}}, overallocated) per
> value to hold the length, but for dimensional points these values are
> always the same length. 
> This can save another 4 bytes of heap per indexed dimensional point,
> which is a big improvement (more points can fit in heap at once) for
> 1D and 2D lat/lon points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7122) BytesRefArray can be more efficient for fixed width values

Reply via email to