[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552594#comment-13552594
 ] 

Shai Erera commented on LUCENE-4620:
------------------------------------

I'm baffled too. There is some overhead with the bulk API, in that it needs to 
{{grow()}} the {{IntsBuffer}} (something it didn't need to do before). But I 
believe that this growing should stabilize after few docs (i.e. the array 
becomes large enough). Still, every iteration checks if the array is large 
enough, so perhaps if we grow the IntsRef upfront (even if too much), we can 
remove the 'ifs'.

SimpleIntDecoder can do it easily, it knows there are 4 bytes per value, so it 
should just grow by buf.length / 4. VInt is more tricky, but to be on the safe 
side it can grow by buf.length, as at the minimum each value occupies only one 
byte. Some other decoders are trickier, but they are not in effect in your test 
above.

But I must admit that I thought it's a no brainer that replacing an iterator 
API by a bulk is going to improve performance. And indeed, {{EncodingSpeed}} 
shows nice improvements already. And even if decoding values is not the major 
part of faceted search (which I doubt), we shouldn't see slowdowns? At the most 
we shouldn't see big wins?
                
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
>                 Key: LUCENE-4620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to