[jira] Commented: (LUCENE-2470) Add conditional braching/merging to Lucene's analysis pipeline

Steven Rowe (JIRA) Wed, 19 May 2010 13:47:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869326#action_12869326
 ]


Steven Rowe commented on LUCENE-2470:
-------------------------------------

bq. I think the real key is, if we can make it nice to do this declaratively, 
for example in a Solr schema definition.

I agree.

We could start with a BranchingStageFactory that takes in a structured 
conditional processing specification, but I have the feeling that it will seem 
like declarative specification of the entire analysis pipeline, ala Solr, is 
the way to go.



> Add conditional braching/merging to Lucene's analysis pipeline
> --------------------------------------------------------------
>
>                 Key: LUCENE-2470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2470
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>    Affects Versions: 4.0
>            Reporter: Steven Rowe
>            Priority: Minor
>
> Captured from a #lucene brainstorming session with Robert Muir:
> Lucene's analysis pipeline would be more flexible if it were possible to 
> apply filter(s) to only part of an input stream's tokens, under 
> user-specifiable conditions (e.g. when a given token attribute has a 
> particular value) in a way that did not place that responsibility on 
> individual filters.
> Two use cases:
> # StandardAnalyzer could directly handle ideographic characters in the same 
> way as CJKTokenizer, which generates bigrams, if it could call ShingleFilter 
> only when the TypeAttribute=<CJK>, or if Robert's new 
> ScriptAttribute=<Ideographic>.
> # Stemming might make sense for some stemmer/domain combinations only when 
> token length exceeds some threshold.  For example, a user could configure an 
> analyzer to stem only when CharTermAttribute length is greater than 4 
> characters.
> One potential way to achieve this conditional branching facility is with a 
> new kind of filter that can be configured with one or more following filters 
> and condition(s) under which the filter should be engaged.  This could be 
> called BranchingFilter.
> I think a MergingFilter, the inverse of BranchingFilter, is necessary in the 
> current pipeline architecture, to have a single pipeline endpoint.  A 
> MergingFilter might be useful in its own right, e.g. to collect document data 
> from multiple sources.  Perhaps a conditional merging facility would be 
> useful as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2470) Add conditional braching/merging to Lucene's analysis pipeline

Reply via email to