Re: [jira] [Commented] (SOLR-2519) Improve the defaults for the "text" field type in default schema.xml

Erick Erickson Wed, 18 May 2011 18:50:35 -0700

+1. I've seen far too many implementations of Solr that blindly use
the example configurations and then wonder why the results are
surprising (WordDelimiterFilterFactory by itself has confused more
people than I can recollect).


Although, just to contradict myself, I guess if people don't really
look at the configs, they deserver the consequences...

And to contra-contradict myself, at least that would give us a clue on
the user's list about where to look first!

Erick

2011/5/18 Jan Høydahl (JIRA) <j...@apache.org>:
>
>    [ 
> https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035796#comment-13035796
>  ]
>
> Jan Høydahl commented on SOLR-2519:
> -----------------------------------
>
> Largely agree with @Hoss' suggestion. But I think it would be wise to 
> emphasize that the example schema is just that - an *example* - encouraging 
> people to create new fieldTypes instead of editing the example ones. It's not 
> a problem for "int", "date" etc, but for text I always encourage our 
> customers and students to stay away from the FieldTypes in the example and 
> make their own versions instead.
>
> One way to further encourage this best practice is naming all text FieldTypes 
> clearly as examples, e.g.
>
> {code}
> <fieldType name="text_example_en" ..>
> <fieldType name="text_example_generic" ..>
> {code}
>
> We must realize that a lot of non-american users out there are already 
> customizing their schemas with the naming pattern "text_<lang>", which means 
> you'll find "text_en", "text_it", "text_no" in a lot of installations. 
> Therefore it would be un-wise to introduce new FieldTypes wich crashes with 
> those names out of the box in version 3.2, thus include _example in the type 
> name.
>
> When upgrading, I always leave all the example field types intact, and add my 
> custom ones separately, clearly marked by comments for easy copy/paste. I 
> believe this to be a fairly common practice, and wanted as well, which would 
> give no clashes for the above example.
>
> With this example naming practice, we can be pretty sure that if people talk 
> about the fieldType "text_example_en" on the lists, they mean the default 
> example type, but if they talk about "text_en", it's something they've 
> customized themselves (if so by simply renaming the example). It'll be more 
> mental resitance for people to start modifying something with "_example" in 
> it wihout also changing the name.
>
>> Improve the defaults for the "text" field type in default schema.xml
>> --------------------------------------------------------------------
>>
>>                 Key: SOLR-2519
>>                 URL: https://issues.apache.org/jira/browse/SOLR-2519
>>             Project: Solr
>>          Issue Type: Bug
>>            Reporter: Michael McCandless
>>            Assignee: Michael McCandless
>>             Fix For: 3.2, 4.0
>>
>>         Attachments: SOLR-2519.patch
>>
>>
>> Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5
>> The text fieldType in schema.xml is unusable for non-whitespace
>> languages, because it has the dangerous auto-phrase feature (of
>> Lucene's QP -- see LUCENE-2458) enabled.
>> Lucene leaves this off by default, as does ElasticSearch
>> (http://http://www.elasticsearch.org/).
>> Furthermore, the "text" fieldType uses WhitespaceTokenizer when
>> StandardTokenizer is a better cross-language default.
>> Until we have language specific field types, I think we should fix
>> the "text" fieldType to work well for all languages, by:
>>   * Switching from WhitespaceTokenizer to StandardTokenizer
>>   * Turning off auto-phrase
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2519) Improve the defaults for the "text" field type in default schema.xml

Reply via email to