[
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jon Marius Venstad updated OPENNLP-1520:
----------------------------------------
Description:
The recursive stemming, which seems hard to actually trigger, but which is the
intended usage of the {{methodObject and method}} in the {{Among}} class
(called reflectively) is completely broken. First off, it tries to invoke a
private method from outside the class (from a parent class, the
{{{}SnowballProgram{}}}), which fails with an illegal access exception; if that
worked, it would also have invoked _all_ such method calls on the {_}same,
shared, static object{_}—not on the relevant stemmer instance.
This was fixed 8 years ago, but it looks like the generated code in the
opennlp-tools is 10 years old. I would urge you to re-generate that code.
Commit that fixed the Java code generation:
[https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
Relevant sample stemmer with broken Java:
[https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
Stack trace showing illegal reflection access:
{noformat}
2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram
cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer
with modifiers "private"
exception=java.lang.IllegalAccessException: class
opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of class
opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
at
java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
at
java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
at java.base/java.lang.reflect.Method.invoke(Method.java:560)
at
opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
at
opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
at
opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003)
at
opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
at
com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
at
com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
...{noformat}
Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]
was:
The recursive stemming, which seems hard to actually trigger, but which is the
intended usage of the {{methodObject and method}} in the {{Among}} class
(called reflectively) is completely broken. First off, it tries to invoke a
private method from outside the class (from a parent class, the
{{{}SnowballProgram{}}}), which fails with an illegal access exception; if that
worked, it would also have invoked _all_ such method calls on the {_}same,
shared, static object{_}—not on the relevant stemmer instance.
This was fixed 8 years ago, but it looks like the generated code in the
opennlp-tools is 10 years old. I would urge you to re-generate that code.
Commit that fixed the Java code generation:
[https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
Relevant sample stemmer with broken Java:
[https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
Stack trace showing illegal reflection access:
{noformat}
2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram
cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer
with modifiers "private"
exception=java.lang.IllegalAccessException: class
opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of class
opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
at
java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
at
java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
at java.base/java.lang.reflect.Method.invoke(Method.java:560)
at
opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
at
opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
at
opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003)
at
opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
at
com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74)
at
com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
at
com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
...{noformat}
Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]
> Generated Java code for stemmers is broken, and should be re-generated
> ----------------------------------------------------------------------
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
> Issue Type: Bug
> Components: Stemmer
> Affects Versions: 2.3.0
> Reporter: Jon Marius Venstad
> Priority: Major
>
> The recursive stemming, which seems hard to actually trigger, but which is
> the intended usage of the {{methodObject and method}} in the {{Among}} class
> (called reflectively) is completely broken. First off, it tries to invoke a
> private method from outside the class (from a parent class, the
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if
> that worked, it would also have invoked _all_ such method calls on the
> {_}same, shared, static object{_}—not on the relevant stemmer instance.
> This was fixed 8 years ago, but it looks like the generated code in the
> opennlp-tools is 10 years old. I would urge you to re-generate that code.
>
> Commit that fixed the Java code generation:
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>
> Stack trace showing illegal reflection access:
>
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer
> with modifiers "private"
> exception=java.lang.IllegalAccessException: class
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
> at
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>
> at
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>
> at java.base/java.lang.reflect.Method.invoke(Method.java:560)
> at
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>
> at
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>
> at
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003)
> at
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131)
> at
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>
> at
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>
> at
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74)
> at
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>
> at
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>
>
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)