[ 
https://issues.apache.org/jira/browse/LUCENE-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481268#comment-14481268
 ] 

Robert Muir commented on LUCENE-5989:
-------------------------------------

If we fix this .document api to allow a StringField to have a binary value, 
maybe it could help with merge code.

Currently the StoredFieldsVisitor returns strings as java.lang.String, which is 
wasteful for the default merge implementation (it must decode/re-encode). If we 
could remove this and let the visitor deal with it, default merge could avoid 
this decode/re-encode and we might be able to even nuke some specialized bulk 
merge logic that we have solely for reasons like this (at the least we will 
speed up the worst case). I tried to look at this recently and the .document 
api stopped me. 

Not something we have to fix here, but just something related to think about 
when looking at how to change it.

> Add BinaryField, to index a single binary token
> -----------------------------------------------
>
>                 Key: LUCENE-5989
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5989
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-5989.patch
>
>
> 5 years ago (LUCENE-1458) we "enabled" fully binary terms in the
> lowest levels of Lucene (the codec APIs) yet today, actually adding an
> arbitrary byte[] binary term during indexing is far from simple: you
> must make a custom Field with a custom TokenStream and a custom
> TermToBytesRefAttribute, as far as I know.
> This is supremely expert, I wonder if anyone out there has succeeded
> in doing so?
> I think we should make indexing a single byte[] as simple as indexing
> a single String.
> This is a pre-cursor for issues like LUCENE-5596 (encoding IPv6
> address as byte[16]) and LUCENE-5879 (encoding native numeric values
> in their simple binary form).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to