Re: Thread-safe versions of some of the tools

Cohan Sujay Carlos Wed, 11 Jan 2017 07:06:12 -0800

I meant:

a)  Instantiate the components in the local scope that leads to their
references being in the call (thread) stack.



On Wed, Jan 11, 2017 at 8:33 PM, Cohan Sujay Carlos <[email protected]>
wrote:

> Control over threading is not required to "share the model between
> threads and create one instance of the component per thread".
>
> One could use a scope where variable references are guaranteed to be
> stored in the call stack (say method-local variables in Java).
>
> You could then:
>
> a)  Instantiate the components on the call stack.
> b)  Instantiate the models in constructors or the factory methods of a
> singleton.
>
> If one were using OpenNLP in a Tomcat webapp, for instance, one could, I
> believe, use this method.
>
> Cohan Sujay Carlos
>
>
> On Wed, Jan 11, 2017 at 7:08 PM, Thilo Goetz <[email protected]> wrote:
>
>> Correct me if I'm wrong, but that approach only works if you control the
>> thread creation yourself. In my case, for example, I was using Scala's
>> parallel collection API, and had no control over the threading. I will
>> usually want to create one service that does tokenization or POS tagging or
>> whatever, which can be accessed by many threads. I don't want to have to
>> mess around with an object pool, or thread locals, or anything like that.
>> Especially since there is really no good reason IMHO. You could very easily
>> just return the probabilities together with the spans, and whoever doesn't
>> need them can ignore them. Or have two methods, one with probabilities, one
>> without. Maybe it's just where I'm coming from, but I fail to see the
>> advantages of the current approach.
>>
>> --Thilo
>>
>>
>>
>> On 11/01/2017 13:58, Joern Kottmann wrote:
>>
>>> Hello Thilo,
>>>
>>> I am interested in your opinion about how this is done currently.
>>> We say: "Share the model between threads and create one instance of the
>>> component per thread".
>>>
>>> Wouldn't that work well in your use case?
>>>
>>> Jörn
>>>
>>>
>>>
>>> On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz <[email protected]> wrote:
>>>
>>> Hi,
>>>>
>>>> in a recent project, I was using SentenceDetectorME, TokenizerME and
>>>> POSTaggerME. It turns out that none of those is thread safe. This is
>>>> because the classification probabilities for the last tag() call (for
>>>> example) are stored in a member variable and can be retrieved by a
>>>> separate
>>>> API call.
>>>>
>>>> I'm planning to build thread safe versions for myself, and I'd be happy
>>>> to
>>>> contribute a patch if there is interest. This could be done as a
>>>> conservative extension with an additional method such as tagReentrant,
>>>> where the old API calls would continue to work as before and would still
>>>> not be thread safe. Alternatively, one could remodel the API so that
>>>> everything was thread safe, but that would break backwards
>>>> compatibility.
>>>>
>>>> Final question: if I do this for the classes mentioned above, are there
>>>> other tools that should be made thread safe while we're at it?
>>>>
>>>> Opinions?
>>>>
>>>> --Thilo
>>>>
>>>>
>>>>
>>>>
>>
>

Re: Thread-safe versions of some of the tools

Reply via email to