On Sat, Mar 2, 2013 at 2:23 PM, harish suvarna <[email protected]> wrote: > Rupert, > Who creates the one instance per thread specifically one opennlp > tokenizer/postagger per thread.? is it the > commons.opennlp or Stanbol has its own code.?
You must have misunderstood me. * Models are singletons that are used by all threads . SentenceModel, TokenizerModel, POSModel, ChunkerModel and TokenNameFinderModel are all singeltons. Those things do need a lot of memory so it good to have them as singletons. * SentenceDetectors, Tokenizers, POSTagger, Chunker and TokenNameFinders are created for each request (on top of the singleton models). Those are lightweight components so reusing them would not bring much of an advantage. The code for loading and managing the singelton models is part of the org.apache.stanbol.commons.opennlp module (see org.apache.stanbol.commons.opennlp.OpenNLP for details). But this class is mainly about * OSGI integration * using the Stanbol DataFileProvider [1] infrastructure for loading model files. and not to workaround some OpenNLP concurrency issues. Actually the way OpenNLP treats with concurrency seams to me just fine. I had much more troubles with concurrency when integrating Freeling [2] and Talismane [3] with Stanbol. best Rupert [1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider [2] https://github.com/insideout10/stanbol-freeling [3] https://github.com/westei/stanbol-talismane > > -harish > > On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler < > [email protected]> wrote: > >> Hi >> >> Stanbol uses a single instance of Models (e.g. POSModel). They are >> loaded and managed by the OpenNLP service (commons.opennlp module). >> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on >> top of models (e.g. POSTagger on top of the PosModel). So each request >> will create a new instance. This is exactly because PostTagger, >> Tokenizers ... are not thread safe (as stated by the documentation). >> As the documentation also mentions hat those objects are rather light >> weight it was not taken in considerations to cache those things in >> ResourcePools are ThreadLocal variables. >> >> best >> Rupert >> >> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <[email protected]> wrote: >> > OpenNLP documentation says postagger and tokenizer etc are not thread >> safe. >> > Couple of Internet posts and OpenNLP discussion forums also indicate >> this. >> > How is Stanbol using OpenNLP to make it thread safe? Do you use java >> > synchonised or thread-local or any java locking to make it thread safe.? >> > I have not ran into this thread safe issues in Stanbol yet. Opennlp guy >> > says create one instance of opennlp components per thread. >> > >> > >> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof >> > -- >> > Thanks >> > Harish >> >> >> >> -- >> | Rupert Westenthaler [email protected] >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > Thanks > Harish -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
