[jira] [Commented] (SOLR-2519) Improve the defaults for the "text" field type in default schema.xml

Michael McCandless (JIRA) Fri, 27 May 2011 09:01:33 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040298#comment-13040298
 ]


Michael McCandless commented on SOLR-2519:
------------------------------------------

bq. I think we need to stop kidding ourselves about example/default and just 
recognize that 99.99999999999% of users just use the example as their default 
configuration. Guys, the example is the default, there is simply not argument, 
this is the reality! So I think we should present reasonable field type names 
such as text_en etc. Please don't waste any more of our time trying to convince 
users that the default is actually an example, its a default.

OK I agree.  So I'll rename the fields back to text_XX (instead of 
text_example_XX).

bq. 3. The aggressive analysis is totally unnecessary and gives bad results, 
this is not 1985... Lets drop the porter stemmer and the stopwords list and 
replace them with less aggressive defaults such as s-stemmer and a commongrams 
configuration.

Sounds great!  Can you post the analyzer XML for this....?  Kinda out of my 
league at this point :)

bq. 4. I do not think the default query parser should be the lucene one, if we 
have a fancy one (edismax?) that happily handles user input without 
exceptions... why not just default to the best we have to offer?!

+1

Robert maybe you can take the patch and iterate w/ these changes...?


> Improve the defaults for the "text" field type in default schema.xml
> --------------------------------------------------------------------
>
>                 Key: SOLR-2519
>                 URL: https://issues.apache.org/jira/browse/SOLR-2519
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2519.patch, SOLR-2519.patch, SOLR-2519.patch
>
>
> Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5
> The text fieldType in schema.xml is unusable for non-whitespace
> languages, because it has the dangerous auto-phrase feature (of
> Lucene's QP -- see LUCENE-2458) enabled.
> Lucene leaves this off by default, as does ElasticSearch
> (http://http://www.elasticsearch.org/).
> Furthermore, the "text" fieldType uses WhitespaceTokenizer when
> StandardTokenizer is a better cross-language default.
> Until we have language specific field types, I think we should fix
> the "text" fieldType to work well for all languages, by:
>   * Switching from WhitespaceTokenizer to StandardTokenizer
>   * Turning off auto-phrase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2519) Improve the defaults for the "text" field type in default schema.xml

Reply via email to