[
https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472873#comment-16472873
]
Steve Rowe commented on LUCENE-8273:
------------------------------------
I've belatedly reviewed the changes committed under this issue - awesome
additions!
I noticed a few small things that could be improved, and in putting together a
patch I found what I think is a bug/omission in the design:
{{TermExclusionFilterFactory}} can't be used outside of {{CustomAnalyzer}}
because {{ConditionalTokenFilterFactory.inform()}} expects
{{setInnerFilters()}} to have already been called, but
{{TermExclusionFilterFactory}} doesn't have the ability to do that. As a
result, the protected word list can't be loaded. AFAICT this will prevent
{{TermExclusionFilterFactory}} from being used in Solr, e.g.
My idea to address the problem is to allow {{TermExclusionFilterFactory}} to
accept (but not require) as args wrapped filters and those filters' args, and
then call {{setInnerFilters()}} from the ctor. I've added a factory test suite
that appears to validate the idea. See javadocs in the patch for interface
details.
Because of the way I produced it, the patch (forthcoming) includes several
other changes, some of which may be controversial. I'd be happy to factor out
any of these changes if you don't agree with all of them, [~romseygeek]. The
other changes, in no particular order:
* I think {{TermExclusionFilter}} should be renamed to {{ProtectedTermFilter}}
(or similar) - forcing devs&users to deal with *both* "protected" and
"excluded" seems pointless to me; it should be either one or the other
everywhere.
* {{ProtectedTermFilter}} ctor should {{require()}} the protected terms file
param rather than {{get()}} it, since it is in fact required.
* Added {{TestConditionalTokenFilter.testMultipleConditionalFilters()}}
* The boolean sense of {{ConditionalTokenFilter.shouldFilter()}} is backward
in the class javadocs: it says that shouldFilter:false will execute the wrapped
filters, but AFAICT it's the opposite.
* Some other minor javadoc improvements.
If the approach is okay, there is a little bit more work to do: some additional
testing (more tests for graceful handling of bad input, and a Solr test), and
adding Solr ref guide doc. I haven't run precommit or all tests yet, so there
may be some outstanding problems.
> Add a ConditionalTokenFilter
> ----------------------------
>
> Key: LUCENE-8273
> URL: https://issues.apache.org/jira/browse/LUCENE-8273
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch,
> LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch,
> LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265. It would be useful to be able to wrap a TokenFilter
> in such a way that it could optionally be bypassed based on the current state
> of the TokenStream. This could be used to, for example, only apply
> WordDelimiterFilter to terms that contain hyphens.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]