[
https://issues.apache.org/jira/browse/LUCENE-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483065#comment-14483065
]
Uwe Schindler commented on LUCENE-5989:
---------------------------------------
bq. I tried to fix BinaryTokenStreams attr to be "proper" as Uwe Schindler but
ran into problems because this BytesRef is pre-shared up front to consumers, so
we can't null it in clear...
I don't think this is a problem here, because the TokenStream is only used
internally and is never visible to the outside (isn't it?). Another thing is
that Attribute's copyTo() does not deep clone, but this is also not an issue
(because nobody has the chance to copy this tokenstream anywhere else).
[~shaie] and I fixed TokensStreams in another issue, where payloads were not
cloned (see changelog, don't have issue number).
In general we should fix the TermToBytesRefAttribute and remove the horrible
fillBytesRef, which was needed in Lucene 4.x because of some early Lucene 3
compatibility. But it makes it hard to use, so we should get rid of it.
TermToBytesRefAttribute should only have a single method: getBytesRef() that
returns the BytesRef.
Generally I am fine. The issues Robert mentioned should be done in a separate
issue.
> Add BinaryField, to index a single binary token
> -----------------------------------------------
>
> Key: LUCENE-5989
> URL: https://issues.apache.org/jira/browse/LUCENE-5989
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, Trunk
>
> Attachments: LUCENE-5989.patch, LUCENE-5989.patch
>
>
> 5 years ago (LUCENE-1458) we "enabled" fully binary terms in the
> lowest levels of Lucene (the codec APIs) yet today, actually adding an
> arbitrary byte[] binary term during indexing is far from simple: you
> must make a custom Field with a custom TokenStream and a custom
> TermToBytesRefAttribute, as far as I know.
> This is supremely expert, I wonder if anyone out there has succeeded
> in doing so?
> I think we should make indexing a single byte[] as simple as indexing
> a single String.
> This is a pre-cursor for issues like LUCENE-5596 (encoding IPv6
> address as byte[16]) and LUCENE-5879 (encoding native numeric values
> in their simple binary form).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]