Re: Opennlp thread safety in Stanbol

Rupert Westenthaler Sat, 02 Mar 2013 06:57:16 -0800

On Sat, Mar 2, 2013 at 2:23 PM, harish suvarna <[email protected]> wrote:
> Rupert,
> Who creates the one instance per thread specifically one opennlp
> tokenizer/postagger per thread.? is it the
>  commons.opennlp or Stanbol has its own code.?

You must have misunderstood me.

* Models are singletons that are used by all threads . SentenceModel,
TokenizerModel, POSModel, ChunkerModel and TokenNameFinderModel are
all singeltons. Those things do need a lot of memory so it good to
have them as singletons.
* SentenceDetectors, Tokenizers, POSTagger, Chunker and
TokenNameFinders are created for each request (on top of the singleton
models). Those are lightweight components so reusing them would not
bring much of an advantage.

The code for loading and managing the singelton models is part of the
org.apache.stanbol.commons.opennlp module (see
org.apache.stanbol.commons.opennlp.OpenNLP for details). But this
class is mainly about

* OSGI integration
* using the Stanbol DataFileProvider [1] infrastructure for loading model files.

and not to workaround some OpenNLP concurrency issues. Actually the
way OpenNLP treats with concurrency seams to me just fine. I had much
more troubles with concurrency when integrating Freeling [2] and
Talismane [3] with Stanbol.

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider
[2] https://github.com/insideout10/stanbol-freeling
[3] https://github.com/westei/stanbol-talismane

>
> -harish
>
> On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi
>>
>> Stanbol uses a single instance of Models (e.g. POSModel). They are
>> loaded and managed by the OpenNLP service (commons.opennlp module).
>> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
>> top of models (e.g. POSTagger on top of the PosModel). So each request
>> will create a new instance. This is exactly because PostTagger,
>> Tokenizers ... are not thread safe (as stated by the documentation).
>> As the documentation also mentions hat those objects are rather light
>> weight it was not taken in considerations to cache those things in
>> ResourcePools are ThreadLocal variables.
>>
>> best
>> Rupert
>>
>> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <[email protected]> wrote:
>> > OpenNLP documentation says postagger and tokenizer etc are not thread
>> safe.
>> > Couple of Internet posts and OpenNLP discussion forums also indicate
>> this.
>> > How is Stanbol using OpenNLP to make it thread safe? Do you use java
>> > synchonised or thread-local or any java locking to make it thread safe.?
>> > I have not ran into this thread safe issues in Stanbol yet.  Opennlp guy
>> > says create one instance of opennlp components per thread.
>> >
>> >
>> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
>> > --
>> > Thanks
>> > Harish
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> Thanks
> Harish

--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Opennlp thread safety in Stanbol

Reply via email to