I created a working portuguese stemmer ( http://people.apache.org/~digy/PortugueseStemmerNew.cs ) from http://snowball.tartarus.org/archives/snowball-discuss/0943.html http://snowball.tartarus.org/archives/snowball-discuss/att-0943/01-SnowballC Sharp.zip
Since it has a BSD license (http://snowball.tartarus.org/license.php), I don't think I can update the PortugueseStemmer.cs under contrib. DIGY -----Original Message----- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Tuesday, September 13, 2011 5:55 PM To: <lucene-net-...@lucene.apache.org> Subject: Re: [Lucene.Net] Test case for: possible infinite loop bug in portuguese snowball stemmer? Here is a test case: string text = @"Califórnia"; Lucene.Net.Analysis.KeywordTokenizer tokenizer = new KeywordTokenizer(new StringReader(text)); Lucene.Net.Analysis.Snowball.SnowballFilter stemmer= new Lucene.Net.Analysis.Snowball.SnowballFilter(tokenizer, "Portuguese"); Lucene.Net.Analysis.Token token; while ((token = stemmer.Next()) != null) { System.Console.WriteLine(tokenText); } Seems to go into infinite loop. Call to stemmer.Next() never returns. Not sure if this is the only stemmer I am having trouble with. And it does happen to us on a near daily basis. Thanks, Bob On Sep 13, 2011, at 9:37 AM, Robert Stewart wrote: > Are there any known issues with snowball stemmers (portuguese in particular) going into some infinite loop? I have a problem that happens on a recurring basis where IndexWriter locks up on AddDocument and never returns (it has taken up to 3 days before we realize it), requiring manual killing of the process. It seems to happen only on portuguese documents from what I can tell so far, and the stack trace when thread is aborted is always as follows: > > System.Threading.ThreadAbortException: Thread was being aborted. > at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeType typeOwner) > at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) > at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) > at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) > at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() > System.SystemException: System.Threading.ThreadAbortException: Thread was being aborted. > at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeType typeOwner) > at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner) > at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) > at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) > at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() > at Lucene.Net.Analysis.Snowball.SnowballFilter.Next() > at Lucene.Net.Analysis.TokenStream.IncrementToken() > at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[] fields, Int32 count) > at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument() > at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm) > at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer analyzer) > > > Is there another list of contrib/snowball issues? I have not been able to reproduce a small test case yet however. Have there been any such issues with stemmers in the past? > > Thanks, > Bob ----- Checked by AVG - www.avg.com Version: 2012.0.1796 / Virus Database: 2082/4494 - Release Date: 09/13/11