Control over threading is not required to "share the model between threads
and create one instance of the component per thread".

One could use a scope where variable references are guaranteed to be stored
in the call stack (say method-local variables in Java).

You could then:

a)  Instantiate the components on the call stack.
b)  Instantiate the models in constructors or the factory methods of a
singleton.

If one were using OpenNLP in a Tomcat webapp, for instance, one could, I
believe, use this method.

Cohan Sujay Carlos


On Wed, Jan 11, 2017 at 7:08 PM, Thilo Goetz <[email protected]> wrote:

> Correct me if I'm wrong, but that approach only works if you control the
> thread creation yourself. In my case, for example, I was using Scala's
> parallel collection API, and had no control over the threading. I will
> usually want to create one service that does tokenization or POS tagging or
> whatever, which can be accessed by many threads. I don't want to have to
> mess around with an object pool, or thread locals, or anything like that.
> Especially since there is really no good reason IMHO. You could very easily
> just return the probabilities together with the spans, and whoever doesn't
> need them can ignore them. Or have two methods, one with probabilities, one
> without. Maybe it's just where I'm coming from, but I fail to see the
> advantages of the current approach.
>
> --Thilo
>
>
>
> On 11/01/2017 13:58, Joern Kottmann wrote:
>
>> Hello Thilo,
>>
>> I am interested in your opinion about how this is done currently.
>> We say: "Share the model between threads and create one instance of the
>> component per thread".
>>
>> Wouldn't that work well in your use case?
>>
>> Jörn
>>
>>
>>
>> On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz <[email protected]> wrote:
>>
>> Hi,
>>>
>>> in a recent project, I was using SentenceDetectorME, TokenizerME and
>>> POSTaggerME. It turns out that none of those is thread safe. This is
>>> because the classification probabilities for the last tag() call (for
>>> example) are stored in a member variable and can be retrieved by a
>>> separate
>>> API call.
>>>
>>> I'm planning to build thread safe versions for myself, and I'd be happy
>>> to
>>> contribute a patch if there is interest. This could be done as a
>>> conservative extension with an additional method such as tagReentrant,
>>> where the old API calls would continue to work as before and would still
>>> not be thread safe. Alternatively, one could remodel the API so that
>>> everything was thread safe, but that would break backwards compatibility.
>>>
>>> Final question: if I do this for the classes mentioned above, are there
>>> other tools that should be made thread safe while we're at it?
>>>
>>> Opinions?
>>>
>>> --Thilo
>>>
>>>
>>>
>>>
>

Reply via email to