It looks like I wasn't the first to ask the question...
http://stackoverflow.com/questions/17071300/using-charfilter-with-lucene-4-3-0s-standardanalyzer
So, apparently this is what the Lucene designers intended. It feels wrong
considering you could get the same result with 2 lines of code if
CreateComponents were public rather than protected.
Anyway, we have these APIs on Analyzer for use in .NET:
public static Analyzer NewAnonymous(Func<string, TextReader,
TokenStreamComponents> createComponents)
public static Analyzer NewAnonymous(Func<string, TextReader,
TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy)
It looks like we should add:
public static Analyzer NewAnonymous(Func<string, TextReader,
TokenStreamComponents> createComponents, Func<string, TextReader, TextReader>
initReader)
public static Analyzer NewAnonymous(Func<string, TextReader,
TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy,
Func<string, TextReader, TextReader> initReader)
That would at least make it possible to do it without having to create a custom
Analyzer class. In Java, this was intended to be used with anonymous classes,
so we need some helper methods to simulate this behavior in .NET:
var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
return new TokenStreamComponents(...);
}, initReader: (fieldName, reader) =>
{
return new HTMLStripCharFilter(reader);
});
-----Original Message-----
From: Shad Storhaug [mailto:[email protected]]
Sent: Thursday, May 11, 2017 10:37 AM
To: [email protected]
Subject: API Woes
I have updated Itamar's LuceneNetDemo to the new API
(https://github.com/NightOwl888/LuceneNetDemo/tree/update-api-format), but
there is an issue with its API usage I am not quite sure about.
In the original demo code, there is an HtmlStripAnalyzerWrapper class
(https://github.com/synhershko/LuceneNetDemo/blob/master/LuceneNetDemo/Analyzers/HtmlStripAnalyzerWrapper.cs)
that returns the result of _wrappedAnalyzer.CreateComponents(). However, in
Java CreateComponents() was a protected method, so it has been updated to be
protected in .NET. Therefore, this line won't compile.
Since the purpose of the HtmlStripAnalyzerWrapper class is to apply a filter to
the passed-in analyzer, I tried another approach. The InitReader() method is
apparently designed for this specific purpose. So, I tried subclassing the
StandardAnalyzer so I could override the InitReader() method. But
StandardAnalyzer is sealed (as it was in Java).
Is the StandardAnalyzer (or any other analyzer that is marked sealed) not
intended to be used in conjunction with a CharFilter? Or is there a loophole in
Java that makes this somehow possible?
Of course, the workaround is to duplicate most of what StandardAnalyzer does
(https://github.com/NightOwl888/LuceneNetDemo/blob/update-api-format/LuceneNetDemo/Analyzers/HtmlStripAnalyzer.cs),
but it seems like there should be another option here. Is this what the Lucene
designers intended?
Thanks,
Shad Storhaug (NightOwl888)