actually, you now can choose to ignore certain characters by using unicode filtering mechanism.
This was added in https://issues.apache.org/jira/browse/LUCENE-8129 So apply a filter such as [^\^] and the filter will ignore ^. On Mon, Jun 4, 2018 at 10:41 AM, Robert Muir <rcm...@gmail.com> wrote: > This cannot be "tweaked" at runtime, it is implemented as custom > normalization. > > You can modify the sources / build your own ruleset or use a different > tokenfilter to normalize characters. > > On Mon, Jun 4, 2018 at 9:07 AM, Michael Sokolov <msoko...@gmail.com> wrote: >> Hi, I'm using ICUFoldingFilter and for the most part it does exactly what I >> want. However there are some behaviors I'd like to tweak. For example it >> maps "aaa^bbb" to "aaabbb". I am trying to understand why it does that, and >> whether there is any way to prevent it. >> >> I spent a little time with >> http://www.unicode.org/reports/tr30/tr30-4.html#UnicodeData which I guess >> is the basis for what this filter does (it's referenced in the javadocs), >> but that didn't answer my questions. As an aside, it seems this tech report >> was withdfrawn by the unicode consortium? Not sure what that means if >> anything, but it seems ominous. >> >> Anyway, I would appreciate pointers to more info, and specifically, whether >> there are any alternatives to the utr30.nrm data file, or any possibility >> to select among the many transformations this filter applies. >> >> Thanks! >> >> Mike S --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org