[ https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034101#comment-13034101 ]
Michael McCandless commented on SOLR-2519: ------------------------------------------ I think the attached patch is a good starting point. It fixes the generic "text" fieldType to have good all around defaults for all languages, so that non-whitespace languages work fine. Then, I think we should iteratively add in custom languages over time (as separate issues). We can eg add text_en_autophrase, text_en, text_zh, etc. We should at least do first sweep of nice analyzers module and add fieldTypes for them. This way we will eventually get to the ideal future when we have text_XX coverage for many languages. > Improve the defaults for the "text" field type in default schema.xml > -------------------------------------------------------------------- > > Key: SOLR-2519 > URL: https://issues.apache.org/jira/browse/SOLR-2519 > Project: Solr > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.2, 4.0 > > Attachments: SOLR-2519.patch > > > Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5 > The text fieldType in schema.xml is unusable for non-whitespace > languages, because it has the dangerous auto-phrase feature (of > Lucene's QP -- see LUCENE-2458) enabled. > Lucene leaves this off by default, as does ElasticSearch > (http://http://www.elasticsearch.org/). > Furthermore, the "text" fieldType uses WhitespaceTokenizer when > StandardTokenizer is a better cross-language default. > Until we have language specific field types, I think we should fix > the "text" fieldType to work well for all languages, by: > * Switching from WhitespaceTokenizer to StandardTokenizer > * Turning off auto-phrase -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org