It looks like I wasn't the first to ask the question... 
http://stackoverflow.com/questions/17071300/using-charfilter-with-lucene-4-3-0s-standardanalyzer

So, apparently this is what the Lucene designers intended. It feels wrong 
considering you could get the same result with 2 lines of code if 
CreateComponents were public rather than protected.

Anyway, we have these APIs on Analyzer for use in .NET:

public static Analyzer NewAnonymous(Func<string, TextReader, 
TokenStreamComponents> createComponents)
public static Analyzer NewAnonymous(Func<string, TextReader, 
TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy)

It looks like we should add:

public static Analyzer NewAnonymous(Func<string, TextReader, 
TokenStreamComponents> createComponents, Func<string, TextReader, TextReader> 
initReader)
public static Analyzer NewAnonymous(Func<string, TextReader, 
TokenStreamComponents> createComponents, ReuseStrategy reuseStrategy, 
Func<string, TextReader, TextReader> initReader)

That would at least make it possible to do it without having to create a custom 
Analyzer class. In Java, this was intended to be used with anonymous classes, 
so we need some helper methods to simulate this behavior in .NET:

var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
{
    return new TokenStreamComponents(...);
}, initReader: (fieldName, reader) => 
{
    return new HTMLStripCharFilter(reader);
});


-----Original Message-----
From: Shad Storhaug [mailto:[email protected]] 
Sent: Thursday, May 11, 2017 10:37 AM
To: [email protected]
Subject: API Woes

I have updated Itamar's LuceneNetDemo to the new API 
(https://github.com/NightOwl888/LuceneNetDemo/tree/update-api-format), but 
there is an issue with its API usage I am not quite sure about.

In the original demo code, there is an HtmlStripAnalyzerWrapper class 
(https://github.com/synhershko/LuceneNetDemo/blob/master/LuceneNetDemo/Analyzers/HtmlStripAnalyzerWrapper.cs)
 that returns the result of _wrappedAnalyzer.CreateComponents(). However, in 
Java CreateComponents() was a protected method, so it has been updated to be 
protected in .NET. Therefore, this line won't compile.

Since the purpose of the HtmlStripAnalyzerWrapper class is to apply a filter to 
the passed-in analyzer, I tried another approach. The InitReader() method is 
apparently designed for this specific purpose. So, I tried subclassing the 
StandardAnalyzer so I could override the InitReader() method. But 
StandardAnalyzer is sealed (as it was in Java).

Is the StandardAnalyzer (or any other analyzer that is marked sealed) not 
intended to be used in conjunction with a CharFilter? Or is there a loophole in 
Java that makes this somehow possible?

Of course, the workaround is to duplicate most of what StandardAnalyzer does 
(https://github.com/NightOwl888/LuceneNetDemo/blob/update-api-format/LuceneNetDemo/Analyzers/HtmlStripAnalyzer.cs),
 but it seems like there should be another option here. Is this what the Lucene 
designers intended?

Thanks,
Shad Storhaug (NightOwl888)
  • API Woes Shad Storhaug
    • RE: API Woes Shad Storhaug

Reply via email to