Re: Opennlp thread safety in Stanbol

harish suvarna Mon, 04 Mar 2013 16:42:08 -0800

Got it Rupert. Thanks.
-harish

On Sat, Mar 2, 2013 at 6:56 AM, Rupert Westenthaler <
[email protected]> wrote:


> On Sat, Mar 2, 2013 at 2:23 PM, harish suvarna <[email protected]> wrote:
> > Rupert,
> > Who creates the one instance per thread specifically one opennlp
> > tokenizer/postagger per thread.? is it the
> >  commons.opennlp or Stanbol has its own code.?
>
> You must have misunderstood me.
>
> * Models are singletons that are used by all threads . SentenceModel,
> TokenizerModel, POSModel, ChunkerModel and TokenNameFinderModel are
> all singeltons. Those things do need a lot of memory so it good to
> have them as singletons.
> * SentenceDetectors, Tokenizers, POSTagger, Chunker and
> TokenNameFinders are created for each request (on top of the singleton
> models). Those are lightweight components so reusing them would not
> bring much of an advantage.
>
> The code for loading and managing the singelton models is part of the
> org.apache.stanbol.commons.opennlp module (see
> org.apache.stanbol.commons.opennlp.OpenNLP for details). But this
> class is mainly about
>
> * OSGI integration
> * using the Stanbol DataFileProvider [1] infrastructure for loading model
> files.
>
> and not to workaround some OpenNLP concurrency issues. Actually the
> way OpenNLP treats with concurrency seams to me just fine. I had much
> more troubles with concurrency when integrating Freeling [2] and
> Talismane [3] with Stanbol.
>
> best
> Rupert
>
> [1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider
> [2] https://github.com/insideout10/stanbol-freeling
> [3] https://github.com/westei/stanbol-talismane
>
> >
> > -harish
> >
> > On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler <
> > [email protected]> wrote:
> >
> >> Hi
> >>
> >> Stanbol uses a single instance of Models (e.g. POSModel). They are
> >> loaded and managed by the OpenNLP service (commons.opennlp module).
> >> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
> >> top of models (e.g. POSTagger on top of the PosModel). So each request
> >> will create a new instance. This is exactly because PostTagger,
> >> Tokenizers ... are not thread safe (as stated by the documentation).
> >> As the documentation also mentions hat those objects are rather light
> >> weight it was not taken in considerations to cache those things in
> >> ResourcePools are ThreadLocal variables.
> >>
> >> best
> >> Rupert
> >>
> >> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <[email protected]>
> wrote:
> >> > OpenNLP documentation says postagger and tokenizer etc are not thread
> >> safe.
> >> > Couple of Internet posts and OpenNLP discussion forums also indicate
> >> this.
> >> > How is Stanbol using OpenNLP to make it thread safe? Do you use java
> >> > synchonised or thread-local or any java locking to make it thread
> safe.?
> >> > I have not ran into this thread safe issues in Stanbol yet.  Opennlp
> guy
> >> > says create one instance of opennlp components per thread.
> >> >
> >> >
> >>
> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
> >> > --
> >> > Thanks
> >> > Harish
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > Thanks
> > Harish
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Re: Opennlp thread safety in Stanbol

Reply via email to