: I disagree with Hoss on this issue, removing diacritics in a filter is : not going to "mess up highlighting". The offsets are set by the : tokenizer. So its no different than stemming or any other process.
thanks for correcting me dude ... i'm not sure what i wsa thinkg of, but for some reason i thought there was an issue with the highlighter and token filters that changed the lengths of tokens (including stemming). : The *only* situation where you should use a CharFilter, is when you : must change this stuff before the tokenizer. Can you elaborate on that, because it's definitely something that i'm getting more and more confused by, so i'm sure other people are confused as well. what is an example of a situation where you "must" change stuff before the tokenizer? the HTML Stripper is the one example i understand, but the purpose of hte mapping char filter no longer make sense to me in light of this thread. -Hoss