[
https://issues.apache.org/jira/browse/LUCENE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103119#comment-15103119
]
Paul Elschot commented on LUCENE-5687:
--------------------------------------
I considered that, but it does not not really fit my use case.
TeeSinkTokenFilter buffers its input token states and then the sinks use this
buffer.
What I need is token state buffers after the split, also because the input can
come from another source like the {{XMLEventReader}} parser.
In the XML case the token state needs to be created after parsing, and this
depends on the parser state.
The position increments indeed need care when doing such a split, see
LUCENE-5627 for that.
I am not really happy with the current patch. After LUCENE-6973
{{PrefillTokenStream}} is too similar to
{{TeeSinkTokenFilter.SinkTokenStream}}, and it is now hardly more than a
wrapper around the new {{States}} that provides public methods.
How about making {{States}} public, for example as {{TokenStates}}, instead of
introducing {{PrefillTokenStream}}?
That would allow to reuse this token state buffer of {{TeeSinkTokenFilter}}
after splitting.
> Add PrefillTokenStream in analysis common module
> ------------------------------------------------
>
> Key: LUCENE-5687
> URL: https://issues.apache.org/jira/browse/LUCENE-5687
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Affects Versions: 4.9
> Reporter: Paul Elschot
> Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5687.patch, LUCENE-5687.patch, LUCENE-5687.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]