On 25/02/2013 11:24, Thomas Matthijs wrote:
On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <li...@selckin.be
<mailto:li...@selckin.be>> wrote:
On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs
<li...@selckin.be <mailto:li...@selckin.be>> wrote:
On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor
<paul_t...@fastmail.fm <mailto:paul_t...@fastmail.fm>> wrote:
On 20/02/2013 11:28, Paul Taylor wrote:
Just updating codebase from Lucene 3.6 to Lucene 4.1
and seems my tests that use NormalizeCharMap for
replacing characters in the anyalzers are not working.
bump, anybody I thought a self contained testcase would be
enough to pique somebodys interest, am I doing something
silly - maybe but I can't see it
Tried to run your test but it uses MusicbrainzTokenizer
Well i made it work, if it's a bug that this is required or if it
documented anywhere i don't know, it does seem very trappy:
It is documented all the way at the bottom:
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/package-summary.html
So it should be:
class SimpleAnalyzer extends Analyzer {
protected NormalizeCharMap charConvertMap;
public SimpleAnalyzer() {
NormalizeCharMap.Builder builder = new
NormalizeCharMap.Builder();
builder.add("&", "and");
charConvertMap = builder.build();
}
@Override
protected TokenStreamComponents createComponents(String
fieldName, Reader reader) {
Tokenizer source = new
WhitespaceTokenizer(Version.LUCENE_40, reader);
TokenStream filter = new
LowerCaseFilter(Version.LUCENE_40, source);
return new TokenStreamComponents(source, filter);
}
@Override
protected Reader initReader(String fieldName, Reader reader) {
return new MappingCharFilter(charConvertMap, reader);
}
}
Thanks Thomas, for some reason didnt see your post until now and
independently worked it out.