Hi all, I am in process of creating a patch for Lucene. However, I can’t get the JUnit test TestAllAnalyzersHaveFactories pass. Hope this is the right forum for help. If not kindly direct me to the correct forum. Any help is greatly appreciated!
First, some background. The patch is building on Ted Sullivan work, SOLR-7136. It is an enhanced version of AutoPhrase which I like to submit to community. The code includes a new TokenFilter, AutoPhrasingTokenFilter with Junit tests. I have created following package: org.apache.lucene.analysis.autophrase This package contains the following class files: AutoPhraseDetector.java AutoPhrasingTokenFilter.java AutoPhrasingTokenFilterFactory.java package-info.java When running the test under ant, the test TestAllAnalyzersHaveFactories fails with following output, I have added some print statements for debugging: ============================================================ -test: [junit4] <JUnit4> says ????! Master seed: 86F1C35C6CE11696 [junit4] Your default console's encoding may not display certain unicode glyphs: US-ASCII [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(15156@localhost). [junit4] Suite: org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories [junit4] 1> clazzName: IndicNormalizationFilter [junit4] 1> simpleName: IndicNormalization [junit4] 1> clazzName: HyphenationCompoundWordTokenFilter [junit4] 1> simpleName: HyphenationCompoundWord [junit4] 1> clazzName: DictionaryCompoundWordTokenFilter [junit4] 1> simpleName: DictionaryCompoundWord [junit4] 1> clazzName: BulgarianStemFilter [junit4] 1> simpleName: BulgarianStem [junit4] 1> clazzName: ShingleFilter [junit4] 1> simpleName: Shingle [junit4] 1> clazzName: ReverseStringFilter [junit4] 1> simpleName: ReverseString [junit4] 1> clazzName: GreekLowerCaseFilter [junit4] 1> simpleName: GreekLowerCase [junit4] 1> clazzName: GreekStemFilter [junit4] 1> simpleName: GreekStem [junit4] 1> clazzName: HungarianLightStemFilter [junit4] 1> simpleName: HungarianLightStem [junit4] 1> clazzName: GermanNormalizationFilter [junit4] 1> simpleName: GermanNormalization [junit4] 1> clazzName: GermanLightStemFilter [junit4] 1> simpleName: GermanLightStem [junit4] 1> clazzName: GermanMinimalStemFilter [junit4] 1> simpleName: GermanMinimalStem [junit4] 1> clazzName: GermanStemFilter [junit4] 1> simpleName: GermanStem [junit4] 1> clazzName: EnglishPossessiveFilter [junit4] 1> simpleName: EnglishPossessive [junit4] 1> clazzName: EnglishMinimalStemFilter [junit4] 1> simpleName: EnglishMinimalStem [junit4] 1> clazzName: PorterStemFilter [junit4] 1> simpleName: PorterStem [junit4] 1> clazzName: KStemFilter [junit4] 1> simpleName: KStem [junit4] 1> clazzName: ItalianLightStemFilter [junit4] 1> simpleName: ItalianLightStem [junit4] 1> clazzName: HindiStemFilter [junit4] 1> simpleName: HindiStem [junit4] 1> clazzName: HindiNormalizationFilter [junit4] 1> simpleName: HindiNormalization [junit4] 1> clazzName: RussianLightStemFilter [junit4] 1> simpleName: RussianLightStem [junit4] 1> clazzName: ClassicFilter [junit4] 1> simpleName: Classic [junit4] 1> clazzName: StandardFilter [junit4] 1> simpleName: Standard [junit4] 1> clazzName: CzechStemFilter [junit4] 1> simpleName: CzechStem [junit4] 1> clazzName: ElisionFilter [junit4] 1> simpleName: Elision [junit4] 1> clazzName: DelimitedPayloadTokenFilter [junit4] 1> simpleName: DelimitedPayload [junit4] 1> clazzName: TokenOffsetPayloadTokenFilter [junit4] 1> simpleName: TokenOffsetPayload [junit4] 1> clazzName: NumericPayloadTokenFilter [junit4] 1> simpleName: NumericPayload [junit4] 1> clazzName: TypeAsPayloadTokenFilter [junit4] 1> simpleName: TypeAsPayload [junit4] 1> clazzName: AutoPhrasingTokenFilter [junit4] 1> simpleName: AutoPhrasing [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestAllAnalyzersHaveFactories -Dtests.method=test -Dtests.seed=86F1C35C6CE11696 -Dtests.slow=true -Dtests.locale=zh_CN -Dtests.timezone=US/Samoa -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 2.94s | TestAllAnalyzersHaveFactories.test <<< [junit4] > Throwable #1: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.analysis.util.TokenFilterFactory with name 'AutoPhrasing' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [apostrophe, arabicnormalization, arabicstem, bulgarianstem, brazilianstem, cjkbigram, cjkwidth, soraninormalization, soranistem, commongrams, commongramsquery, dictionarycompoundword, hyphenationcompoundword, decimaldigit, lowercase, stop, type, uppercase, czechstem, germanlightstem, germanminimalstem, germannormalization, germanstem, greeklowercase, greekstem, englishminimalstem, englishpossessive, kstem, porterstem, spanishlightstem, persiannormalization, finnishlightstem, frenchlightstem, frenchminimalstem, irishlowercase, galicianminimalstem, galicianstem, hindinormalization, hindistem, hungarianlightstem, hunspellstem, indonesianstem, indicnormalization, italianlightstem, latvianstem, asciifolding, capitalization, codepointcount, fingerprint, hyphenatedwords, keepword, keywordmarker, keywordrepeat, length, limittokencount, limittokenoffset, limittokenposition, removeduplicates, stemmeroverride, trim, truncate, worddelimiter, scandinavianfolding, scandinaviannormalization, edgengram, ngram, norwegianlightstem, norwegianminimalstem, patternreplace, patterncapturegroup, delimitedpayload, numericpayload, tokenoffsetpayload, typeaspayload, portugueselightstem, portugueseminimalstem, portuguesestem, reversestring, russianlightstem, shingle, snowballporter, serbiannormalization, classic, standard, swedishlightstem, synonym, turkishlowercase, elision] [junit4] > at __randomizedtesting.SeedInfo.seed([86F1C35C6CE11696:EA5FC86C21D7B6E]:0) [junit4] > at org.apache.lucene.analysis.util.AnalysisSPILoader.lookupClass(AnalysisSPILoader.java:135) [junit4] > at org.apache.lucene.analysis.util.TokenFilterFactory.lookupClass(TokenFilterFactory.java:42) [junit4] > at org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories.test(TestAllAnalyzersHaveFactories.java:168) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=CheapBastard, sim=ClassicSimilarity, locale=zh_CN, timezone=US/Samoa [junit4] 2> NOTE: Linux 2.6.32-358.el6.x86_64 amd64/Oracle Corporation 1.8.0_05 (64-bit)/cpus=4,threads=1,free=136794808,total=160432128 [junit4] 2> NOTE: All tests run in this JVM: [TestAllAnalyzersHaveFactories] [junit4] Completed [1/1] in 4.33s, 1 test, 1 error <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures [seed: 86F1C35C6CE11696]: [junit4] - org.apache.lucene.analysis.core.TestAllAnalyzersHaveFactories.test [junit4] [junit4] [junit4] JVM J0: 0.66 .. 6.09 = 5.44s [junit4] Execution time total: 6.11 sec. [junit4] Tests summary: 1 suite, 1 test, 1 error ================================================ Running the test under debugger in Eclipse, it gives the same error message for a different Factory class 'DaitchMokitoffSoundex'. This may or may not be related to my issue, not sure. My guess is there is some sort of class loader issue. My understanding of the test is that it is making sure there is a corresponding TokenFilter Factory for a TokenFilter. In this case that would be AutoPhrasingTokenFilterFactory. Now, I checked to make sure the class is created. The 'find' command shows the class at: build/analysis/common/classes/java/org/apache/lucene/analysis/autophrase/AutoPhrasingTokenFilterFactory.class The location is similar to other Filter factories. I have put in print statement as well as running the test in Eclipse debugger. As far as I can see, the test code sees the AutoPhrasingTokenFilter. Looking at TestAllAnalyzersHaveFactories.java, at line marked with '1>', the test code picks up the class AutoPhrasingTokenFilter. However, when it gets to line '2>', it fails: =========================================== public void test() throws Exception { 1> List<Class<?>> analysisClasses = TestRandomChains.getClassesForPackage("org.apache.lucene.analysis"); ClassLoader cl = ClassLoader.getSystemClassLoader(); URL[] urls = ((URLClassLoader)cl).getURLs(); // System.out.println("ClassPath Start:"); for(URL url: urls){ // System.out.println(url.getFile()); } // System.out.println("ClassPath Ends!"); for (final Class<?> c : analysisClasses) { final int modifiers = c.getModifiers(); if ( // don't waste time with abstract classes Modifier.isAbstract(modifiers) || !Modifier.isPublic(modifiers) || c.isSynthetic() || c.isAnonymousClass() || c.isMemberClass() || c.isInterface() || testComponents.contains(c) || crazyComponents.contains(c) || oddlyNamedComponents.contains(c) || c.isAnnotationPresent(Deprecated.class) // deprecated ones are typically back compat hacks || !(Tokenizer.class.isAssignableFrom(c) || TokenFilter.class.isAssignableFrom(c) || CharFilter.class.isAssignableFrom(c)) ) { continue; } Map<String,String> args = new HashMap<>(); args.put("luceneMatchVersion", Version.LATEST.toString()); if (Tokenizer.class.isAssignableFrom(c)) { String clazzName = c.getSimpleName(); assertTrue(clazzName.endsWith("Tokenizer")); String simpleName = clazzName.substring(0, clazzName.length() - 9); assertNotNull(TokenizerFactory.lookupClass(simpleName)); TokenizerFactory instance = null; try { instance = TokenizerFactory.forName(simpleName, args); assertNotNull(instance); if (instance instanceof ResourceLoaderAware) { ((ResourceLoaderAware) instance).inform(loader); } assertSame(c, instance.create().getClass()); } catch (IllegalArgumentException e) { if (e.getCause() instanceof NoSuchMethodException) { // there is no corresponding ctor available throw e; } // TODO: For now pass because some factories have not yet a default config that always works } } else if (TokenFilter.class.isAssignableFrom(c)) { String clazzName = c.getSimpleName(); System.out.println("clazzName: " + clazzName); assertTrue(clazzName.endsWith("Filter")); String simpleName = clazzName.substring(0, clazzName.length() - (clazzName.endsWith("TokenFilter") ? 11 : 6)); System.out.println("simpleName: " + simpleName); 2> assertNotNull(TokenFilterFactory.lookupClass(simpleName)); ===================================================== Here is the code for the factory class: package org.apache.lucene.analysis.autophrase; /* * Copyright 2015 Synopsys, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); you * may not use this file except in compliance with the License. You may * obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import java.util.Map; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.util.CharArraySet; import org.apache.lucene.analysis.util.ResourceLoader; import org.apache.lucene.analysis.util.ResourceLoaderAware; import org.apache.lucene.analysis.util.TokenFilterFactory; public class AutoPhrasingTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware { private CharArraySet phraseSets; private final String phraseSetFiles; private final boolean ignoreCase; private final boolean emitSingleTokens; private final boolean quotePhrase; private final boolean emitAmbiguousPhrases; private String replaceWhitespaceWith = null; public AutoPhrasingTokenFilterFactory(Map<String, String> initArgs) { super( initArgs ); phraseSetFiles = get(initArgs, "phrases"); ignoreCase = getBoolean( initArgs, "ignoreCase", false); emitSingleTokens = getBoolean( initArgs, "includeTokens", false ); quotePhrase = getBoolean( initArgs, "quotePhrase", false ); emitAmbiguousPhrases = getBoolean( initArgs, "emitAmbiguousPhrases", false ); String replaceWhitespaceArg = initArgs.get( "replaceWhitespaceWith" ); if (replaceWhitespaceArg != null) { replaceWhitespaceWith = replaceWhitespaceArg; } } @Override public void inform(ResourceLoader loader) throws IOException { if (phraseSetFiles != null) { phraseSets = getWordSet(loader, phraseSetFiles, ignoreCase); } } @Override public TokenStream create( TokenStream input ) { AutoPhrasingTokenFilter autoPhraseFilter = new AutoPhrasingTokenFilter( input, phraseSets, emitSingleTokens ); if (replaceWhitespaceWith != null) { autoPhraseFilter.setReplaceWhitespaceWith( new Character( replaceWhitespaceWith.charAt( 0 )) ); } //Doesn't make send to emit phrases in double quotes if replaceWhitespaceWith character is set. if ((replaceWhitespaceWith == null) && quotePhrase) { autoPhraseFilter.setQuotePhrase(quotePhrase); } if (emitAmbiguousPhrases) { autoPhraseFilter.setEmitAmbiguousPhrases(emitAmbiguousPhrases); } return autoPhraseFilter; } } Thanks, Koorosh --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org