[ 
https://issues.apache.org/jira/browse/LUCENE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103119#comment-15103119
 ] 

Paul Elschot commented on LUCENE-5687:
--------------------------------------

I considered that, but it does not not really fit my use case.
TeeSinkTokenFilter buffers its input token states and then the sinks use this 
buffer.

What I need is token state buffers after the split, also because the input can 
come from another source like the {{XMLEventReader}} parser.
In the XML case the token state needs to be created after parsing, and this 
depends on the parser state.

The position increments indeed need care when doing such a split, see 
LUCENE-5627 for that.

I am not really happy with the current patch. After LUCENE-6973 
{{PrefillTokenStream}} is too similar to 
{{TeeSinkTokenFilter.SinkTokenStream}}, and it is now hardly more than a 
wrapper around the new {{States}} that provides public methods.

How about making {{States}} public, for example as {{TokenStates}}, instead of 
introducing {{PrefillTokenStream}}?
That would allow to reuse this token state buffer of {{TeeSinkTokenFilter}} 
after splitting.



> Add PrefillTokenStream in analysis common module
> ------------------------------------------------
>
>                 Key: LUCENE-5687
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5687
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.9
>            Reporter: Paul Elschot
>            Priority: Minor
>             Fix For: 4.9
>
>         Attachments: LUCENE-5687.patch, LUCENE-5687.patch, LUCENE-5687.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to