[ 
https://issues.apache.org/jira/browse/SOLR-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034172#comment-13034172
 ] 

Hoss Man commented on SOLR-2519:
--------------------------------

I feel like we are convoluting two issues here: the "default" behavior of 
TextField, and the example configs.

i don't have any strong opinions about changing the default behavior of 
TextField when {{autoGeneratePhraseQueries}} is not specified in the 
{{<fieldType/>}} but if we do make such a change, it should be contingent on 
the schema version property (which we should bump) so that people who upgrade 
will get consistent behavior with their existing configs (TextField.init 
already has an example of this for when we changed the default of {{omitNorms}})

as far as the example configs: i agree with yonik, that changing "text" at this 
point might be confusing ... i think the best way to iterate moving forward 
would probably be:

* rename {{<fieldType name="text"/>}} and {{<field name="text"/>}} to something 
that makes their purpose more clear (text_en, or text_western, or 
text_european, or some other more general descriptive word for the types of 
languages were it makes sense) and switch all existing {{<field/>}} 
declarations that currently use use field type "text" to use this new name.

* add a new {{<fieldType name="text_general"/>}} which is designed (and 
documented to be a general purpose field type when the language is unknown (it 
may make sense to fix/repurpose the existing {{<fieldType name="textgen"/>}} 
for this, since it already suggests that's what it's for)

* Audit all {{<field/>}} declarations that use "text_en" (or whatever name was 
chosen above) and the existing sample data for those fields to see if it makes 
more sense to change them to "text_general". also change any where based on 
usage it shouldn't matter.

The end result being that we have no {{<fieldType/>}} named "text" in the 
example configs, so people won't get it confused with previous versions, and 
we'll have a new {{<fieldType/>}} that works as well as possible with all 
langauges which we use as much as possible with the example data.






> Improve the defaults for the "text" field type in default schema.xml
> --------------------------------------------------------------------
>
>                 Key: SOLR-2519
>                 URL: https://issues.apache.org/jira/browse/SOLR-2519
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2519.patch
>
>
> Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5
> The text fieldType in schema.xml is unusable for non-whitespace
> languages, because it has the dangerous auto-phrase feature (of
> Lucene's QP -- see LUCENE-2458) enabled.
> Lucene leaves this off by default, as does ElasticSearch
> (http://http://www.elasticsearch.org/).
> Furthermore, the "text" fieldType uses WhitespaceTokenizer when
> StandardTokenizer is a better cross-language default.
> Until we have language specific field types, I think we should fix
> the "text" fieldType to work well for all languages, by:
>   * Switching from WhitespaceTokenizer to StandardTokenizer
>   * Turning off auto-phrase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to