NightOwl888 opened a new issue, #1126: URL: https://github.com/apache/lucenenet/issues/1126
### Is there an existing issue for this? - [x] I have searched the existing issues ### Task description With respect to Lucene 4.8.1, we are missing types from both `TestAllAnalyzersHaveFactories` and `TestRandomChains`. [`TestAllAnalyzersHaveFactories` uses `TestRandomChains.getClassesForPackage()`](https://github.com/apache/lucene/blob/releases/lucene-solr/4.8.1/lucene/analysis/common/src/test/org/apache/lucene/analysis/core/TestAllAnalyzersHaveFactories.java#L122-L123) in Java to load the types from both referenced and external `.jar`s based on classpath. So, in Java these tests both get their types from the same method. However, in .NET we are currently only considering the types that are available in `Lucene.Net.Analysis.Common` and not any other assemblies that may contain types from the same namespace. We don't have a common method to retrieve the types. I did a comparison with `TestAllAnalyzersHaveFactories` and `TestRandomChains` to get a list of all of the missing Tokenizers, Token Filters, and Char Filters as well as checking the reverse to see if we have any that don't exist in Lucene 4.8.1. # TestRandomChains ## Tokenizers Missing - `Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer` - `Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer` - `Lucene.Net.Analysis.Ja.JapaneseTokenizer` - `Lucene.Net.Analysis.MockTokenizer` - `Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer` - `Lucene.Net.Analysis.Th.ThaiTokenizer` ## TokenFilters Missing - `Lucene.Net.Analysis.CachingTokenFilter` - `Lucene.Net.Analysis.Icu.ICUFoldingFilter` - `Lucene.Net.Analysis.Icu.ICUNormalizer2Filter` - `Lucene.Net.Analysis.Icu.ICUTransformFilter` - `Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter` - `Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter` - `Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter` - `Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter` - `Lucene.Net.Analysis.MockFixedLengthPayloadFilter` - `Lucene.Net.Analysis.MockGraphTokenFilter` - `Lucene.Net.Analysis.MockHoleInjectingTokenFilter` - `Lucene.Net.Analysis.MockRandomLookaheadTokenFilter` - `Lucene.Net.Analysis.MockTokenFilter` - `Lucene.Net.Analysis.MockVariableLengthPayloadFilter` - `Lucene.Net.Analysis.Morfologik.MorfologikFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter` - `Lucene.Net.Analysis.Phonetic.BeiderMorseFilter` - `Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter` - `Lucene.Net.Analysis.Phonetic.PhoneticFilter` - `Lucene.Net.Analysis.Stempel.StempelFilter` - `Lucene.Net.Analysis.TrivialLookaheadFilter` - `Lucene.Net.Analysis.ValidatingTokenFilter` - `Lucene.Net.TestFramework.Analysis.CrankyTokenFilter` ## CharFilters Missing - `Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter` - `Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter` - `Lucene.Net.Analysis.MockCharFilter` ## Tokenizers Extra - *(No entries)* ## TokenFilters Extra - `Lucene.Net.Analysis.Fa.PersianStemFilter` - This was contributed by the Lucene.NET community. - `Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter` - This was added from Lucene 8.2.0 because the opennlp module calls it out in the documentation. ## CharFilters Extra - `Lucene.Net.Analysis.Util.BufferedCharFilter` - This was created to add `BufferedReader` support to `CharFilter` for specific cases that require buffering. # TestAllAnalyzersHaveFactories ## Tokenizers Missing - `Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer` - `Lucene.Net.Analysis.Icu.Segmentation.ICUTokenizer` - `Lucene.Net.Analysis.Ja.JapaneseTokenizer` - `Lucene.Net.Analysis.OpenNlp.OpenNLPTokenizer` - `Lucene.Net.Analysis.Th.ThaiTokenizer` ## TokenFilters Missing - `Lucene.Net.Analysis.Icu.ICUFoldingFilter` - `Lucene.Net.Analysis.Icu.ICUNormalizer2Filter` - `Lucene.Net.Analysis.Icu.ICUTransformFilter` - `Lucene.Net.Analysis.Ja.JapaneseBaseFormFilter` - `Lucene.Net.Analysis.Ja.JapaneseKatakanaStemFilter` - `Lucene.Net.Analysis.Ja.JapanesePartOfSpeechStopFilter` - `Lucene.Net.Analysis.Ja.JapaneseReadingFormFilter` - `Lucene.Net.Analysis.Morfologik.MorfologikFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPChunkerFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPLemmatizerFilter` - `Lucene.Net.Analysis.OpenNlp.OpenNLPPOSFilter` - `Lucene.Net.Analysis.Phonetic.BeiderMorseFilter` - `Lucene.Net.Analysis.Phonetic.DoubleMetaphoneFilter` - `Lucene.Net.Analysis.Phonetic.PhoneticFilter` - `Lucene.Net.Analysis.Stempel.StempelFilter` - `Lucene.Net.Analysis.TrivialLookaheadFilter` ## CharFilters Missing - `Lucene.Net.Analysis.Icu.ICUNormalizer2CharFilter` - `Lucene.Net.Analysis.Ja.JapaneseIterationMarkCharFilter` ## Tokenizers Extra - *(No entries)* ## TokenFilters Extra - `Lucene.Net.Analysis.Fa.PersianStemFilter` - `Lucene.Net.Analysis.Miscellaneous.TypeAsSynonymFilter` ## CharFilters Extra - *(No entries)* ------------------------- A few ways we could address this: 1. Add the references to the other projects that contain the above types. 2. Load the assemblies for the above types programmatically in some way. 3. Port the system that was created for Lucene 9.1.0 in https://issues.apache.org/jira/browse/LUCENE-10352. ------------------------- In Java, both tests will fail on Lucene 8.8.1 (using jdk 1.8.0_202) and Lucene 4.8.1 (using jdk 1.8.0_302). There are problems both with using Reflection on the constructors and with loading resources. I suspected there have been security patches in recent versions of Java 8 that invalidated the old way of loading these types, but I checked with [Java SE Development Kit 8u25](https://lucenenet.apache.org/contributing/how-to-setup-java-lucene-debugging.html#installing-java-8), and it isn't working. I was able to get `TestRandomChains` running with the following code in the loop of the `beforeClass()` method: ```java String name = c.getName(); // Constructors don't resolve if (name.equals("org.apache.lucene.analysis.icu.ICUNormalizer2CharFilter") || name.equals("org.apache.lucene.analysis.icu.segmentation.ICUTokenizer") || name.equals("org.apache.lucene.analysis.icu.ICUNormalizer2Filter") || name.equals("org.apache.lucene.analysis.icu.ICUTransformFilter") || name.equals("org.apache.lucene.analysis.ja.JapaneseTokenizer") || name.equals("org.apache.lucene.analysis.phonetic.BeiderMorseFilter") || name.equals("org.apache.lucene.analysis.phonetic.PhoneticFilter") || name.equals("org.apache.lucene.analysis.stempel.StempelFilter") || name.equals("org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer") || name.equals("org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer") // Resources don't resolve || name.equals("org.apache.lucene.analysis.morfologik.MorfologikFilter") ) { continue; } ``` However, it still tends to crash when running tests with any of the other components that are in non-referenced packages. I suspect it is due to a failure when loading resources. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org