On Feb 16, 2015, at 4:54 PM, Levy, Michael <ml...@ushmm.org> wrote: > I think you can accomplish what you want by using ICUFoldingFilterFactory > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory > > which should simply perform ICU (cf http://site.icu-project.org/) based > character folding (cf. http://www.unicode.org/reports/tr30/tr30-4.html) > > In schema.xml I generally have in both index and query: > > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.ICUFoldingFilterFactory" />
For unknown reasons, I was unable to load the ICUFoldingFilterFactory, but nonetheless, my interface works as expected. And I was able to do this after a combination of things. First, I needed to tell the indexer my content was Spanish, and after doing so, Solr parses things correctly. Second, I needed to explicitly tell my Web browser that the search form and returned content were using UTF-8. This was done the HTTP content-type header, the HTML meta tag, and even in the HTML form. Geesh! Through this whole process I also learned about Solr’s edismax (extended dismax) handler. Edismax supports free form queries as well as Boolean logic. solr++ But also solr+- because Solr is getting more and more and more complicated. —Eric “Lost In Chicago” Morgan