+1. I've seen far too many implementations of Solr that blindly use the example configurations and then wonder why the results are surprising (WordDelimiterFilterFactory by itself has confused more people than I can recollect).
Although, just to contradict myself, I guess if people don't really look at the configs, they deserver the consequences... And to contra-contradict myself, at least that would give us a clue on the user's list about where to look first! Erick 2011/5/18 Jan Høydahl (JIRA) <j...@apache.org>: > > [ > https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035796#comment-13035796 > ] > > Jan Høydahl commented on SOLR-2519: > ----------------------------------- > > Largely agree with @Hoss' suggestion. But I think it would be wise to > emphasize that the example schema is just that - an *example* - encouraging > people to create new fieldTypes instead of editing the example ones. It's not > a problem for "int", "date" etc, but for text I always encourage our > customers and students to stay away from the FieldTypes in the example and > make their own versions instead. > > One way to further encourage this best practice is naming all text FieldTypes > clearly as examples, e.g. > > {code} > <fieldType name="text_example_en" ..> > <fieldType name="text_example_generic" ..> > {code} > > We must realize that a lot of non-american users out there are already > customizing their schemas with the naming pattern "text_<lang>", which means > you'll find "text_en", "text_it", "text_no" in a lot of installations. > Therefore it would be un-wise to introduce new FieldTypes wich crashes with > those names out of the box in version 3.2, thus include _example in the type > name. > > When upgrading, I always leave all the example field types intact, and add my > custom ones separately, clearly marked by comments for easy copy/paste. I > believe this to be a fairly common practice, and wanted as well, which would > give no clashes for the above example. > > With this example naming practice, we can be pretty sure that if people talk > about the fieldType "text_example_en" on the lists, they mean the default > example type, but if they talk about "text_en", it's something they've > customized themselves (if so by simply renaming the example). It'll be more > mental resitance for people to start modifying something with "_example" in > it wihout also changing the name. > >> Improve the defaults for the "text" field type in default schema.xml >> -------------------------------------------------------------------- >> >> Key: SOLR-2519 >> URL: https://issues.apache.org/jira/browse/SOLR-2519 >> Project: Solr >> Issue Type: Bug >> Reporter: Michael McCandless >> Assignee: Michael McCandless >> Fix For: 3.2, 4.0 >> >> Attachments: SOLR-2519.patch >> >> >> Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5 >> The text fieldType in schema.xml is unusable for non-whitespace >> languages, because it has the dangerous auto-phrase feature (of >> Lucene's QP -- see LUCENE-2458) enabled. >> Lucene leaves this off by default, as does ElasticSearch >> (http://http://www.elasticsearch.org/). >> Furthermore, the "text" fieldType uses WhitespaceTokenizer when >> StandardTokenizer is a better cross-language default. >> Until we have language specific field types, I think we should fix >> the "text" fieldType to work well for all languages, by: >> * Switching from WhitespaceTokenizer to StandardTokenizer >> * Turning off auto-phrase > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org