Re: Thread-safe versions of some of the tools

Joern Kottmann Wed, 11 Jan 2017 13:54:19 -0800

+1 ease of use is important for us and has always been a strong focus
here.


Jörn

On Wed, 2017-01-11 at 17:39 +0100, Thilo Goetz wrote:
> You can do all sorts of things. I implemented a version now that
> uses 
> ThreadLocals. Works fine, but quite frankly, it's a pain in the
> butt. 
> The world has been moving to multi-threaded for a long time now, and
> I 
> think it's a very reasonable assumption that a simple tool like a
> POS 
> tagger is thread safe, without me as an API user having to think
> about it.
> 
> When I ran the OpenNLP stack multi-threaded and saw the exceptions,
> I 
> read the signs and figured out what the issue was. Not everybody will
> be 
> able to do that. They will just see that it crashes, and move on to
> a 
> different tool. If the POS tagger can not be made thread safe,
> that's 
> what I will do, actually.
> 
> But, that's just my opinion. If your approach works for you, that's
> great.
> 
> 
> On 11/01/2017 16:05, Cohan Sujay Carlos wrote:
> > I meant:
> > 
> > a)  Instantiate the components in the local scope that leads to
> > their
> > references being in the call (thread) stack.
> > 
> > 
> > On Wed, Jan 11, 2017 at 8:33 PM, Cohan Sujay Carlos <[email protected]
> > om>
> > wrote:
> > 
> > > Control over threading is not required to "share the model
> > > between
> > > threads and create one instance of the component per thread".
> > > 
> > > One could use a scope where variable references are guaranteed to
> > > be
> > > stored in the call stack (say method-local variables in Java).
> > > 
> > > You could then:
> > > 
> > > a)  Instantiate the components on the call stack.
> > > b)  Instantiate the models in constructors or the factory methods
> > > of a
> > > singleton.
> > > 
> > > If one were using OpenNLP in a Tomcat webapp, for instance, one
> > > could, I
> > > believe, use this method.
> > > 
> > > Cohan Sujay Carlos
> > > 
> > > 
> > > On Wed, Jan 11, 2017 at 7:08 PM, Thilo Goetz <[email protected]>
> > > wrote:
> > > 
> > > > Correct me if I'm wrong, but that approach only works if you
> > > > control the
> > > > thread creation yourself. In my case, for example, I was using
> > > > Scala's
> > > > parallel collection API, and had no control over the threading.
> > > > I will
> > > > usually want to create one service that does tokenization or
> > > > POS tagging or
> > > > whatever, which can be accessed by many threads. I don't want
> > > > to have to
> > > > mess around with an object pool, or thread locals, or anything
> > > > like that.
> > > > Especially since there is really no good reason IMHO. You could
> > > > very easily
> > > > just return the probabilities together with the spans, and
> > > > whoever doesn't
> > > > need them can ignore them. Or have two methods, one with
> > > > probabilities, one
> > > > without. Maybe it's just where I'm coming from, but I fail to
> > > > see the
> > > > advantages of the current approach.
> > > > 
> > > > --Thilo
> > > > 
> > > > 
> > > > 
> > > > On 11/01/2017 13:58, Joern Kottmann wrote:
> > > > 
> > > > > Hello Thilo,
> > > > > 
> > > > > I am interested in your opinion about how this is done
> > > > > currently.
> > > > > We say: "Share the model between threads and create one
> > > > > instance of the
> > > > > component per thread".
> > > > > 
> > > > > Wouldn't that work well in your use case?
> > > > > 
> > > > > Jörn
> > > > > 
> > > > > 
> > > > > 
> > > > > On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz <[email protected]
> > > > > > wrote:
> > > > > 
> > > > > Hi,
> > > > > > in a recent project, I was using SentenceDetectorME,
> > > > > > TokenizerME and
> > > > > > POSTaggerME. It turns out that none of those is thread
> > > > > > safe. This is
> > > > > > because the classification probabilities for the last tag()
> > > > > > call (for
> > > > > > example) are stored in a member variable and can be
> > > > > > retrieved by a
> > > > > > separate
> > > > > > API call.
> > > > > > 
> > > > > > I'm planning to build thread safe versions for myself, and
> > > > > > I'd be happy
> > > > > > to
> > > > > > contribute a patch if there is interest. This could be done
> > > > > > as a
> > > > > > conservative extension with an additional method such as
> > > > > > tagReentrant,
> > > > > > where the old API calls would continue to work as before
> > > > > > and would still
> > > > > > not be thread safe. Alternatively, one could remodel the
> > > > > > API so that
> > > > > > everything was thread safe, but that would break backwards
> > > > > > compatibility.
> > > > > > 
> > > > > > Final question: if I do this for the classes mentioned
> > > > > > above, are there
> > > > > > other tools that should be made thread safe while we're at
> > > > > > it?
> > > > > > 
> > > > > > Opinions?
> > > > > > 
> > > > > > --Thilo
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> 
>

Re: Thread-safe versions of some of the tools

Reply via email to