On Wed, Sep 12, 2012 at 3:44 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
wrote:
>
> That would be a serious impediment. For some of our uncontrolled fields,
> the same word can be cased very differently: CD, cd, Cd. To be of the
> safe side, the client would have to ask for 3 times the wanted amount of
> facet information. But if we cannot normalize at index time,
> de-duplication on the server would require changes to the faceting code.

I'll open an issue for this. We should at least fix the analysis
factory APIs to support it, even if
the solr configuration xml doesn't yet have syntax.

>
> Regardless, it sounds that the idea passes the initial sanity check.
> Should I open a JIRA issue for it?

I think you should.

As an ugly workaround to the above problem: you could actually
construct a Lucene Analyzer with KeywordTokenizer(ICUCollationAtt)
followed by LowerCase/etc/etc and load that up with <analyzer
class=....> in solr. I think that will work fine.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to