Christopher Ball created OPENNLP-1518:
-----------------------------------------
Summary: Roberta-based Models - Add support for utilization via
Onnx
Key: OPENNLP-1518
URL: https://issues.apache.org/jira/browse/OPENNLP-1518
Project: OpenNLP
Issue Type: Improvement
Components: language model
Affects Versions: 2.3.0
Reporter: Christopher Ball
It appears that *Roberta* based models do not work with *OpenNLP* via *ONNX*
*Example Model*
[https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx]
*Comments from Jeff Zemerick*
Looks like some differences with the model:
{*}First{*}, the vocab file is a json file. OpenNLP expects a plain text file
with each token one per line. So it's not able to load the vocab file.
{*}Second{*}, the model doesn't expect a token type ID param. By default,
OpenNLP includes that param but you can tell it not to in the InferenceOptions
class. (Set includeTokenTypeIds == false .)
{*}Third{*}, that model is returning a one-dimensional array back. OpenNLP is
expecting a two-dimensional array.
I am guessing with those changes it would work. Nothing else jumps out at me.
If you get time, please feel free to write those up as OpenNLP jira tickets and
send me the links. We can easily support a JSON file for the vocab, and we can
also support the 1-d array back but I might need to see how best to support
models that return either a 1d and a 2d array. (edited)
It might be a matter of checking whether it's a 1d or 2d and going from there.
*Stack Trace*
java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()"
because the return value of "java.util.Map.get(Object)" is null
at
opennlp.dl.doccat.DocumentCategorizerDL.tokenize(DocumentCategorizerDL.java:281)
at
opennlp.dl.doccat.DocumentCategorizerDL.categorize(DocumentCategorizerDL.java:104)
at
opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:192)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0
out of bounds for length 0
at
opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:198)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at
org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)