Control over threading is not required to "share the model between threads and create one instance of the component per thread".
One could use a scope where variable references are guaranteed to be stored in the call stack (say method-local variables in Java). You could then: a) Instantiate the components on the call stack. b) Instantiate the models in constructors or the factory methods of a singleton. If one were using OpenNLP in a Tomcat webapp, for instance, one could, I believe, use this method. Cohan Sujay Carlos On Wed, Jan 11, 2017 at 7:08 PM, Thilo Goetz <[email protected]> wrote: > Correct me if I'm wrong, but that approach only works if you control the > thread creation yourself. In my case, for example, I was using Scala's > parallel collection API, and had no control over the threading. I will > usually want to create one service that does tokenization or POS tagging or > whatever, which can be accessed by many threads. I don't want to have to > mess around with an object pool, or thread locals, or anything like that. > Especially since there is really no good reason IMHO. You could very easily > just return the probabilities together with the spans, and whoever doesn't > need them can ignore them. Or have two methods, one with probabilities, one > without. Maybe it's just where I'm coming from, but I fail to see the > advantages of the current approach. > > --Thilo > > > > On 11/01/2017 13:58, Joern Kottmann wrote: > >> Hello Thilo, >> >> I am interested in your opinion about how this is done currently. >> We say: "Share the model between threads and create one instance of the >> component per thread". >> >> Wouldn't that work well in your use case? >> >> Jörn >> >> >> >> On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz <[email protected]> wrote: >> >> Hi, >>> >>> in a recent project, I was using SentenceDetectorME, TokenizerME and >>> POSTaggerME. It turns out that none of those is thread safe. This is >>> because the classification probabilities for the last tag() call (for >>> example) are stored in a member variable and can be retrieved by a >>> separate >>> API call. >>> >>> I'm planning to build thread safe versions for myself, and I'd be happy >>> to >>> contribute a patch if there is interest. This could be done as a >>> conservative extension with an additional method such as tagReentrant, >>> where the old API calls would continue to work as before and would still >>> not be thread safe. Alternatively, one could remodel the API so that >>> everything was thread safe, but that would break backwards compatibility. >>> >>> Final question: if I do this for the classes mentioned above, are there >>> other tools that should be made thread safe while we're at it? >>> >>> Opinions? >>> >>> --Thilo >>> >>> >>> >>> >
