[ https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040298#comment-13040298 ]
Michael McCandless commented on SOLR-2519: ------------------------------------------ bq. I think we need to stop kidding ourselves about example/default and just recognize that 99.99999999999% of users just use the example as their default configuration. Guys, the example is the default, there is simply not argument, this is the reality! So I think we should present reasonable field type names such as text_en etc. Please don't waste any more of our time trying to convince users that the default is actually an example, its a default. OK I agree. So I'll rename the fields back to text_XX (instead of text_example_XX). bq. 3. The aggressive analysis is totally unnecessary and gives bad results, this is not 1985... Lets drop the porter stemmer and the stopwords list and replace them with less aggressive defaults such as s-stemmer and a commongrams configuration. Sounds great! Can you post the analyzer XML for this....? Kinda out of my league at this point :) bq. 4. I do not think the default query parser should be the lucene one, if we have a fancy one (edismax?) that happily handles user input without exceptions... why not just default to the best we have to offer?! +1 Robert maybe you can take the patch and iterate w/ these changes...? > Improve the defaults for the "text" field type in default schema.xml > -------------------------------------------------------------------- > > Key: SOLR-2519 > URL: https://issues.apache.org/jira/browse/SOLR-2519 > Project: Solr > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.2, 4.0 > > Attachments: SOLR-2519.patch, SOLR-2519.patch, SOLR-2519.patch > > > Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5 > The text fieldType in schema.xml is unusable for non-whitespace > languages, because it has the dangerous auto-phrase feature (of > Lucene's QP -- see LUCENE-2458) enabled. > Lucene leaves this off by default, as does ElasticSearch > (http://http://www.elasticsearch.org/). > Furthermore, the "text" fieldType uses WhitespaceTokenizer when > StandardTokenizer is a better cross-language default. > Until we have language specific field types, I think we should fix > the "text" fieldType to work well for all languages, by: > * Switching from WhitespaceTokenizer to StandardTokenizer > * Turning off auto-phrase -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org