Hi everyone,
I am trying to port forward to 4.2 some Lucene 3.2-era code that uses the
ASCIIFoldingFilter.
The token stream handling has changed significantly since them, and I cannot
figure out what I am doing wrong.
It seems that I should extend AnalyzerWrapper so that I can intercept the
TokenStream and filter it with the ASCIIFoldingFilter.
I have written the following code:
public final class TokenFilterAnalyzerWrapper extends AnalyzerWrapper {
private final Analyzer baseAnalyzer;
private final TokenFilterFactory tokenFilterFactory;
public TokenFilterAnalyzerWrapper(Analyzer baseAnalyzer, TokenFilterFactory
tokenFilterFactory) {
this.baseAnalyzer = baseAnalyzer;
this.tokenFilterFactory = tokenFilterFactory;
}
@Override
public void close() {
baseAnalyzer.close();
super.close();
}
@Override
protected Analyzer getWrappedAnalyzer(String fieldName)
{
return baseAnalyzer;
}
@Override
protected TokenStreamComponents wrapComponents(String fieldName,
TokenStreamComponents components)
{
return new TokenStreamComponents(components.getTokenizer(),
tokenFilterFactory.create(components.getTokenStream()));
}
}
and the following test case:
public class TokenFilterAnalyzerWrapperTest
{
@Test
public void testFilter() throws Exception
{
char[] expected = {'a', 'e', 'i', 'o', 'u'};
try (Analyzer analyzer = new TokenFilterAnalyzerWrapper(new
StandardAnalyzer(Version.LUCENE_42), new ASCIIFoldingFilterFactory())) {
TokenStream stream = analyzer.tokenStream("test", new
StringReader("a é î ø ü"));
for (int i = 0; i < 5; i++) {
assertTrue(stream.incrementToken());
assertEquals(Character.toString(expected[i]),
stream.getAttribute(CharTermAttribute.class).toString());
}
assertFalse(stream.incrementToken());
}
}
}
but all I can produce is this NullPointerException:
java.lang.NullPointerException
at
org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
at
org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
at
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:180)
at
org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49)
at
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
at
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
at
org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:71)
at
xyz.search.lucene.TokenFilterAnalyzerWrapperTest.testFilter(TokenFilterAnalyzerWrapperTest.java:27)
StandardTokenizerImpl.java:923 is
/* finally: fill the buffer with new input */
int numRead = zzReader.read(zzBuffer, zzEndRead,
zzBuffer.length-zzEndRead);
The "reader" is clearly the unexpectedly null value, however I cannot figure
out how to set it correctly.
Through experimentation, it seems that I can evade some problems by calling
reset() and setReader() at various points.
However I always end up at some other exception buried deep within, so I
believe I am still missing some piece of the puzzle.
Any help greatly appreciated!
Thanks,
Steven
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]