tokenfiler out of order

Robert Muir (JIRA) Tue, 14 Sep 2010 16:13:16 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909511#action_12909511
 ]


Robert Muir commented on SOLR-2119:
-----------------------------------

{quote}
There seems to be a segment of hte user population that has a hard time 
understanding the distinction between a charfilter, a tokenizer, and a 
tokenfilter - while we can certianly try to improve the documentation about 
what exactly each does, and when they take affect in the analysis chain, one 
other thing we should do is try to educate people when they constuct their 
<analyzer> in a way that doesn't make any sense.
{quote}

I think we should do both, this is a great idea.

{quote}
(we could easily make such a situation fail to initialize, but i'm not 
convinced that would be the best course of action, since some people may have 
schema's where they have declared a charFilter or tokenizer out of order 
relative to their tokenFilters, but are still getting "correct" results that 
work for them, and breaking their instance on upgrade doens't seem like it 
would be productive)
{quote}

I would prefer a hard error. I think someone who doesnt understand what 
tokenizers and filters do, likely isnt looking at their log files either.

In my opinion, Solr should be more picky about its configuration. Often times 
if i havent had enough sleep i will type tokenFilter instead of filter, and it 
simply ignores it completely, instead of an error.

and i can't be the only one that does this, its not obvious that tokenizer = 
Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = 
TokenFilter.


> IndexSchema should log warning if <analyzer> is declared with 
> charfilter/tokenizer/tokenfiler out of order
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-2119
>                 URL: https://issues.apache.org/jira/browse/SOLR-2119
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Hoss Man
>
> There seems to be a segment of hte user population that has a hard time 
> understanding the distinction between a charfilter, a tokenizer, and a 
> tokenfilter -- while we can certianly try to improve the documentation about 
> what exactly each does, and when they take affect in the analysis chain, one 
> other thing we should do is try to educate people when they constuct their 
> <analyzer> in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo 
> <tokenFilter/> before the <tokenizer/>" to try and get certain behavior ... 
> at a minimum we should log a warning in this case that doing that doesn't 
> have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not 
> convinced that would be the best course of action, since some people may have 
> schema's where they have declared a charFilter or tokenizer out of order 
> relative to their tokenFilters, but are still getting "correct" results that 
> work for them, and breaking their instance on upgrade doens't seem like it 
> would be productive)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order

Reply via email to