[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

Shai Erera (JIRA) Wed, 12 Dec 2012 06:13:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529966#comment-13529966
 ]


Shai Erera commented on LUCENE-4620:
------------------------------------

Also, today there are few IntEncoders which are used during indexing only, e.g. 
SortingIntEncoder and UniqueIntEncoder which guarantee that an ordinal will be 
written just once to the payload, and sort them so that DGap can be computed 
afterwards. These do not have a matching Decoder, and they shouldn't have, 
because at search time you don't care if the ords are sorted or not, and you 
can assume they are unique.

Another thing that I think we should do is move those encoders into the *.facet 
package. They are currently under the facet module, but o.a.l.util, b/c again 
we thought at the time that they are a generic piece of code for 
encoding/decoding integers. Lucene has PackedInts and DataInput/Output for 
doing block and VInt encodings. Users can write Codecs for other encoding 
algorithms ... IntEncoder/Decoder are not that generic :).
                
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
>                 Key: LUCENE-4620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

Reply via email to