[
https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602716#comment-14602716
]
Christoph Kaser commented on LUCENE-6586:
-----------------------------------------
Hi Michael,
I tried to write a small test case and realized that there is no input that
leads to a wrong token.
substCount is only used to decide how large the original input was, because
some suffixes are only stripped if the token has a minimum length.
{code}
if ( ( buffer.length() + substCount > 5 ) &&
buffer.substring( buffer.length() - 2, buffer.length() ).equals( "nd" ) )
{
buffer.delete( buffer.length() - 2, buffer.length() );
}
{code}
However, every substitution leaves at least one character. For the bug to take
effect, there has to be a substitution before the one that sets substCount to 2
(instead of incrementing it by 2).
So we have
- 2 characters that where left by the (at least 2) substitutions
- the suffix "nd"
- substCount, which was set to 2
That sums up to 6 , which is greater than 5
The other conditions that check on substCount work the same, except they check
for greater than 4.
Therefore, there is no token that triggers any wrong behaviour.
Still, I think the typo should be fixed, because it might be copied to a place
where it has an effect.
> There is a typo in GermanStemmer that can lead to wrong stemming
> ----------------------------------------------------------------
>
> Key: LUCENE-6586
> URL: https://issues.apache.org/jira/browse/LUCENE-6586
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/analysis
> Affects Versions: 5.2.1
> Reporter: Christoph Kaser
> Priority: Minor
>
> There is a small typo in GermanStemmer that leads to a wrong calclulation of
> the substCount in line 203:
> {code}substCount =+ 2;{code}
> should be
> {code}substCount += 2;{code}
> I created a Pull Request for this some time ago, but it was apprently
> overlooked:
> https://github.com/apache/lucene-solr/pull/141
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]