[ 
https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602716#comment-14602716
 ] 

Christoph Kaser commented on LUCENE-6586:
-----------------------------------------

Hi Michael,

I tried to write a small test case and realized that there is no input that 
leads to a wrong token.
substCount is only used to decide how large the original input was, because 
some suffixes are only stripped if the token has a minimum length.

{code}
if ( ( buffer.length() + substCount > 5 ) &&
      buffer.substring( buffer.length() - 2, buffer.length() ).equals( "nd" ) )
    {
      buffer.delete( buffer.length() - 2, buffer.length() );
    }
{code}

However, every substitution leaves at least one character. For the bug to take 
effect, there has to be a substitution before the one that sets substCount to 2 
(instead of incrementing it by 2).
So we have
- 2 characters that where left by the (at least 2) substitutions
- the suffix  "nd" 
- substCount, which was set to 2
That sums up to 6 , which is greater than 5

The other conditions that check on substCount work the same, except they check 
for greater than 4.

Therefore, there is no token that triggers any wrong behaviour.

Still, I think the typo should be fixed, because it might be copied to a place 
where it has an effect.

> There is a typo in GermanStemmer that can lead to wrong stemming
> ----------------------------------------------------------------
>
>                 Key: LUCENE-6586
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6586
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 5.2.1
>            Reporter: Christoph Kaser
>            Priority: Minor
>
> There is a small typo in GermanStemmer that leads to a wrong calclulation of 
> the substCount in line 203:
> {code}substCount =+ 2;{code}
> should be
> {code}substCount += 2;{code}
> I created a Pull Request for this some time ago, but it was apprently 
> overlooked:
> https://github.com/apache/lucene-solr/pull/141



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to