On Wed, Sep 12, 2012 at 3:44 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > > That would be a serious impediment. For some of our uncontrolled fields, > the same word can be cased very differently: CD, cd, Cd. To be of the > safe side, the client would have to ask for 3 times the wanted amount of > facet information. But if we cannot normalize at index time, > de-duplication on the server would require changes to the faceting code.
I'll open an issue for this. We should at least fix the analysis factory APIs to support it, even if the solr configuration xml doesn't yet have syntax. > > Regardless, it sounds that the idea passes the initial sanity check. > Should I open a JIRA issue for it? I think you should. As an ugly workaround to the above problem: you could actually construct a Lucene Analyzer with KeywordTokenizer(ICUCollationAtt) followed by LowerCase/etc/etc and load that up with <analyzer class=....> in solr. I think that will work fine. -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org