+1 ease of use is important for us and has always been a strong focus here.
Jörn On Wed, 2017-01-11 at 17:39 +0100, Thilo Goetz wrote: > You can do all sorts of things. I implemented a version now that > uses > ThreadLocals. Works fine, but quite frankly, it's a pain in the > butt. > The world has been moving to multi-threaded for a long time now, and > I > think it's a very reasonable assumption that a simple tool like a > POS > tagger is thread safe, without me as an API user having to think > about it. > > When I ran the OpenNLP stack multi-threaded and saw the exceptions, > I > read the signs and figured out what the issue was. Not everybody will > be > able to do that. They will just see that it crashes, and move on to > a > different tool. If the POS tagger can not be made thread safe, > that's > what I will do, actually. > > But, that's just my opinion. If your approach works for you, that's > great. > > > On 11/01/2017 16:05, Cohan Sujay Carlos wrote: > > I meant: > > > > a) Instantiate the components in the local scope that leads to > > their > > references being in the call (thread) stack. > > > > > > On Wed, Jan 11, 2017 at 8:33 PM, Cohan Sujay Carlos <[email protected] > > om> > > wrote: > > > > > Control over threading is not required to "share the model > > > between > > > threads and create one instance of the component per thread". > > > > > > One could use a scope where variable references are guaranteed to > > > be > > > stored in the call stack (say method-local variables in Java). > > > > > > You could then: > > > > > > a) Instantiate the components on the call stack. > > > b) Instantiate the models in constructors or the factory methods > > > of a > > > singleton. > > > > > > If one were using OpenNLP in a Tomcat webapp, for instance, one > > > could, I > > > believe, use this method. > > > > > > Cohan Sujay Carlos > > > > > > > > > On Wed, Jan 11, 2017 at 7:08 PM, Thilo Goetz <[email protected]> > > > wrote: > > > > > > > Correct me if I'm wrong, but that approach only works if you > > > > control the > > > > thread creation yourself. In my case, for example, I was using > > > > Scala's > > > > parallel collection API, and had no control over the threading. > > > > I will > > > > usually want to create one service that does tokenization or > > > > POS tagging or > > > > whatever, which can be accessed by many threads. I don't want > > > > to have to > > > > mess around with an object pool, or thread locals, or anything > > > > like that. > > > > Especially since there is really no good reason IMHO. You could > > > > very easily > > > > just return the probabilities together with the spans, and > > > > whoever doesn't > > > > need them can ignore them. Or have two methods, one with > > > > probabilities, one > > > > without. Maybe it's just where I'm coming from, but I fail to > > > > see the > > > > advantages of the current approach. > > > > > > > > --Thilo > > > > > > > > > > > > > > > > On 11/01/2017 13:58, Joern Kottmann wrote: > > > > > > > > > Hello Thilo, > > > > > > > > > > I am interested in your opinion about how this is done > > > > > currently. > > > > > We say: "Share the model between threads and create one > > > > > instance of the > > > > > component per thread". > > > > > > > > > > Wouldn't that work well in your use case? > > > > > > > > > > Jörn > > > > > > > > > > > > > > > > > > > > On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz <[email protected] > > > > > > wrote: > > > > > > > > > > Hi, > > > > > > in a recent project, I was using SentenceDetectorME, > > > > > > TokenizerME and > > > > > > POSTaggerME. It turns out that none of those is thread > > > > > > safe. This is > > > > > > because the classification probabilities for the last tag() > > > > > > call (for > > > > > > example) are stored in a member variable and can be > > > > > > retrieved by a > > > > > > separate > > > > > > API call. > > > > > > > > > > > > I'm planning to build thread safe versions for myself, and > > > > > > I'd be happy > > > > > > to > > > > > > contribute a patch if there is interest. This could be done > > > > > > as a > > > > > > conservative extension with an additional method such as > > > > > > tagReentrant, > > > > > > where the old API calls would continue to work as before > > > > > > and would still > > > > > > not be thread safe. Alternatively, one could remodel the > > > > > > API so that > > > > > > everything was thread safe, but that would break backwards > > > > > > compatibility. > > > > > > > > > > > > Final question: if I do this for the classes mentioned > > > > > > above, are there > > > > > > other tools that should be made thread safe while we're at > > > > > > it? > > > > > > > > > > > > Opinions? > > > > > > > > > > > > --Thilo > > > > > > > > > > > > > > > > > > > > > > > > > >
