Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

Benson Margulies Mon, 16 Sep 2013 11:11:09 -0700

Thanks, I might pitch in.


On Mon, Sep 16, 2013 at 12:58 PM, Robert Muir <[email protected]> wrote:

> Mostly because our tokenizers like StandardTokenizer will tokenize the
> same way regardless of normalization form or whether its normalized at
> all?
>
> But for other tokenizers, such a charfilter should be useful: there is
> a JIRA for it, but it has some unresolved issues
>
> https://issues.apache.org/jira/browse/LUCENE-4072
>
> On Sun, Sep 15, 2013 at 7:05 PM, Benson Margulies <[email protected]>
> wrote:
> > Can anyone shed light as to why this is a token filter and not a char
> > filter? I'm wishing for one of these _upstream_ of a tokenizer, so that
> the
> > tokenizer's lookups in its dictionaries are seeing normalized contents.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: org.apache.lucene.analysis.icu.ICUNormalizer2Filter -- why Token?

Reply via email to