[ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575192#comment-13575192
 ] 

Robert Muir commented on LUCENE-4764:
-------------------------------------

{quote}
 The problem is that packed-ints is only good if you know something about the 
numbers, i.e. their size, distribution etc. But with category ordinals, on this 
Wikipedia index, there's nothing "special" about them. Really every document 
keeps close to arbitrary integers between 1 - 2.2M
{quote}

{quote}
If the following math holds – 25 ords per document
{quote}

Right but i dont look at what its doing this way. Today the ords for the 
document are vint-deltas (or similar) within a byte[] right?

So instead perhaps the codec could encode the "first ord" (minimum) for the doc 
in a simple int[] or whatever, but the additional deltas are all within a big 
packed stream or something like that.

In all cases i like the idea of a specialized docvaluesformat for facets. it 
doesn't have to be one-sized-fits-all: it could have a number of strategies 
depending on whether someone had 5 ords/doc or 500 ords/doc for example, by 
examining the iterator once at index-time to decide.
                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to