[
https://issues.apache.org/jira/browse/LUCENE-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053118#comment-16053118
]
Uwe Schindler commented on LUCENE-7877:
---------------------------------------
To me this looks good, although I have not verified all corner cases like how
attributes behave when switching streams. I have just some comments for minor
improvements, also related to [~jpountz]'s comments:
bq. should we only add OffsetAttribute to the concatenated stream if it can be
found on the sources?
This is a good idea, but I don't think its necessary for 2 reasons:
- TokenFilters don't do this, too (and this is more like a TokenFilter taking
multiple inputs)
- The OffsetAttribute is always available by default with the combined
AttributeSource.
bq. should we use a single copyTo call instead of captureState+restoreState on
every token?
Yes! In addition captureState and restoreState depend on order of tokens and
should not be used cross attributesources. So I'd call copyTo also in the
combineSourcesCheck. This is more safe for TokenStreams that added attributes
in different order.
bq. could you use IOUtils.close to close all sources at once so that we keep
trying to close other sources if any of them throws an exception upon closing?
Yes, TokenStream implements Closeable, so this shuld work. And super.close()
should be in a finally block (although not so important).
There is something I'd like to change: I'd cache source's OffsetAttributes also
in ctor in an offsetAttributes[] with same size like the sources[] array. I'd
also use "addAttribute" because otherwise you may miss those from sources that
add the attributes later (but those may break this stream anyways - we can't do
anything against that!).
Finally, I am not sure if we should expose the first constructor to the users
(I'd make it private) or at least add some checks that the AttributeSources are
in sync. Otherwise this one is likely to fail for people who don't know the
Attribute internals. The "combineSources" static is the only way to create an
attributesource. If you just want to mimic other Tokenizers that take an
attributesource: This is wrong here, as this item is more like a TokenFilter:
It has to take the attributes from input(s). I'd like to have combineSources
throw a more user-friendly exception. So if anything breaks in combineSources,
throw some IAE with a more usefull explanation (including the root cause).
> Replace PrefixAwareTokenStream
> ------------------------------
>
> Key: LUCENE-7877
> URL: https://issues.apache.org/jira/browse/LUCENE-7877
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Attachments: LUCENE-7877.patch
>
>
> PrefixAwareTokenStream/PrefixAndSuffixAwareTokenStream use the deprecated and
> broken Token class, which means they will not work with custom attributes.
> We should fix/replace them.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]