[ 
https://issues.apache.org/jira/browse/OPENNLP-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zemerick reassigned OPENNLP-1518:
--------------------------------------

    Assignee: Jeff Zemerick

> Roberta-based Models - Add support for utilization via Onnx
> -----------------------------------------------------------
>
>                 Key: OPENNLP-1518
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1518
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: language model
>    Affects Versions: 2.3.0
>            Reporter: Christopher Ball
>            Assignee: Jeff Zemerick
>            Priority: Major
>
> It appears that *Roberta* based models do not work with *OpenNLP* via *ONNX*
>  
> *Example Model*
> [https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx]
>  
> *Comments from Jeff Zemerick*
> Looks like some differences with the model:
> {*}First{*}, the vocab file is a json file. OpenNLP expects a plain text file 
> with each token one per line. So it's not able to load the vocab file.
> {*}Second{*}, the model doesn't expect a token type ID param. By default, 
> OpenNLP includes that param but you can tell it not to in the 
> InferenceOptions class. (Set includeTokenTypeIds == false .)
> {*}Third{*}, that model is returning a one-dimensional array back. OpenNLP is 
> expecting a two-dimensional array.
> I am guessing with those changes it would work. Nothing else jumps out at me. 
> If you get time, please feel free to write those up as OpenNLP jira tickets 
> and send me the links. We can easily support a JSON file for the vocab, and 
> we can also support the 1-d array back but I might need to see how best to 
> support models that return either a 1d and a 2d array. (edited) 
> It might be a matter of checking whether it's a 1d or 2d and going from there.
>  
> *Stack Trace*
> java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" 
> because the return value of "java.util.Map.get(Object)" is null
> at 
> opennlp.dl.doccat.DocumentCategorizerDL.tokenize(DocumentCategorizerDL.java:281)
> at 
> opennlp.dl.doccat.DocumentCategorizerDL.categorize(DocumentCategorizerDL.java:104)
> at 
> opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:192)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
> at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 
> out of bounds for length 0
> at 
> opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:198)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
> at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
> at 
> org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to