Paul Taylor wrote:
Koji Sekiguchi wrote:
Koji Sekiguchi wrote:
Paul Taylor wrote:
I want my search to treat 'No. 1' and 'No.1' the same, because in
our context its one token I want 'No. 1' to become 'No.1', I need
to do this before tokenizing because the tokenizer would split one
value into two terms and one into just one term. I already use a
NormalizeMapFilter to map &' to 'and' but I think it only takes
literal text and I need to
1. be case insensitive (but lowercasefilter is only applied after
tokenizing)
2. cope with all numbers e.g no. 109
So I was going to subclass BaseCharFilter and do my matches with a
regular expression like ([Nn]+[Oo]+\\.) ([0-9]+) but I'm struggling
to understand the offset methods you have to do once you get a
match. Has anyone already got a regular expression Charfilter OR am
I approaching this all wrong
thanks Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
CharStream.Found it at
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/PatternReplaceFilter.java?revision=804726&view=markup,
BTW why not ad this to the Lucene coebase rather than solr code base.
Unfortunately it doesn't address my problem because it is a tokenfilter
that works on a tokenstream, but Im trying to write a CharFilter to be
applied before the text is tokenized , but I'm struggling with
implementing CharStream.correctOffset()
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org