Hi Digy,

On 13.09.2011 22:12, Digy wrote:
I created a working portuguese stemmer (
http://people.apache.org/~digy/PortugueseStemmerNew.cs ) from
   http://snowball.tartarus.org/archives/snowball-discuss/0943.html

http://snowball.tartarus.org/archives/snowball-discuss/att-0943/01-SnowballC
Sharp.zip

Since it has a BSD license (http://snowball.tartarus.org/license.php), I
don't think I can update the PortugueseStemmer.cs under contrib.

Snowball from Tartarus seems to be in Lucene Core:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_3/lucene/contrib/analyzers/common/src/java/org/tartarus/snowball/ext/

under the old BSD license:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_3/lucene/contrib/analyzers/common/src/java/org/tartarus/snowball/Among.java?revision=1141402&view=markup

Robert


DIGY

-----Original Message-----
From: Robert Stewart [mailto:[email protected]]
Sent: Tuesday, September 13, 2011 5:55 PM
To:<[email protected]>
Subject: Re: [Lucene.Net] Test case for: possible infinite loop bug in
portuguese snowball stemmer?

Here is a test case:

string text = @"Califórnia";

Lucene.Net.Analysis.KeywordTokenizer tokenizer = new KeywordTokenizer(new
StringReader(text));

Lucene.Net.Analysis.Snowball.SnowballFilter stemmer=
                 new Lucene.Net.Analysis.Snowball.SnowballFilter(tokenizer,
"Portuguese");

Lucene.Net.Analysis.Token token;

while ((token = stemmer.Next()) != null)
{
        System.Console.WriteLine(tokenText);

}

Seems to go into infinite loop.  Call to stemmer.Next() never returns.  Not
sure if this is the only stemmer I am having trouble with.  And it does
happen to us on a near daily basis.

Thanks,
Bob


On Sep 13, 2011, at 9:37 AM, Robert Stewart wrote:

Are there any known issues with snowball stemmers (portuguese in
particular) going into some infinite loop?  I have a problem that happens on
a recurring basis where IndexWriter locks up on AddDocument and never
returns (it has taken up to 3 days before we realize it), requiring manual
killing of the process.  It seems to happen only on portuguese documents
from what I can tell so far, and the stack trace when thread is aborted is
always as follows:

System.Threading.ThreadAbortException: Thread was being aborted.
   at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, SignatureStruct&  sig,
MethodAttributes methodAttributes, RuntimeType typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, Signature sig, MethodAttributes
methodAttributes, RuntimeType typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
System.SystemException: System.Threading.ThreadAbortException: Thread was
being aborted.
   at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, SignatureStruct&  sig,
MethodAttributes methodAttributes, RuntimeType typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo
method, Object target, Object[] arguments, Signature sig, MethodAttributes
methodAttributes, RuntimeType typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean
skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags
invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
   at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
   at Lucene.Net.Analysis.TokenStream.IncrementToken()
   at Lucene.Net.Index.DocInverterPerField.ProcessFields(Fieldable[]
fields, Int32 count)
   at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
   at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc,
Analyzer analyzer, Term delTerm)
   at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer
analyzer)


Is there another list of contrib/snowball issues?  I have not been able to
reproduce a small test case yet however.  Have there been any such issues
with stemmers in the past?

Thanks,
Bob

-----

Checked by AVG - www.avg.com
Version: 2012.0.1796 / Virus Database: 2082/4494 - Release Date: 09/13/11




Reply via email to