[ 
https://issues.apache.org/jira/browse/LUCENE-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053118#comment-16053118
 ] 

Uwe Schindler commented on LUCENE-7877:
---------------------------------------

To me this looks good, although I have not verified all corner cases like how 
attributes behave when switching streams. I have just some comments for minor 
improvements, also related to [~jpountz]'s comments:

bq. should we only add OffsetAttribute to the concatenated stream if it can be 
found on the sources?

This is a good idea, but I don't think its necessary for 2 reasons:
- TokenFilters don't do this, too (and this is more like a TokenFilter taking 
multiple inputs)
- The OffsetAttribute is always available by default with the combined 
AttributeSource.

bq. should we use a single copyTo call instead of captureState+restoreState on 
every token?

Yes! In addition captureState and restoreState depend on order of tokens and 
should not be used cross attributesources. So I'd call copyTo also in the 
combineSourcesCheck. This is more safe for TokenStreams that added attributes 
in different order.

bq. could you use IOUtils.close to close all sources at once so that we keep 
trying to close other sources if any of them throws an exception upon closing?

Yes, TokenStream implements Closeable, so this shuld work. And super.close() 
should be in a finally block (although not so important).

There is something I'd like to change: I'd cache source's OffsetAttributes also 
in ctor in an offsetAttributes[] with same size like the sources[] array. I'd 
also use "addAttribute" because otherwise you may miss those from sources that 
add the attributes later (but those may break this stream anyways - we can't do 
anything against that!).

Finally, I am not sure if we should expose the first constructor to the users 
(I'd make it private) or at least add some checks that the AttributeSources are 
in sync. Otherwise this one is likely to fail for people who don't know the 
Attribute internals. The "combineSources" static is the only way to create an 
attributesource. If you just want to mimic other Tokenizers that take an 
attributesource: This is wrong here, as this item is more like a TokenFilter: 
It has to take the attributes from input(s). I'd like to have combineSources 
throw a more user-friendly exception. So if anything breaks in combineSources, 
throw some IAE with a more usefull explanation (including the root cause).

> Replace PrefixAwareTokenStream
> ------------------------------
>
>                 Key: LUCENE-7877
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7877
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>         Attachments: LUCENE-7877.patch
>
>
> PrefixAwareTokenStream/PrefixAndSuffixAwareTokenStream use the deprecated and 
> broken Token class, which means they will not work with custom attributes.  
> We should fix/replace them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to