Hi,

   I am little confused. Why do you want to share an instance of a 
SentenceDetectorME across threads? Are you documents very long single 
sentences? I don’t think there is enough work for the SentenceDetectorME to 
make up the cost of multithreading on 4 cores.  

   Previously, I had multiple threads (each with a separate 
SentenceDetectorME/TokenizerME/POSTaggerME) work on different parts of a 
document.   Have you considered decomposing the problem at the document level 
or higher instead of the sentence level?  Maybe you could use regex to break 
the document into paragraphs and have the threads work on the paragraphs.

Daniel

On 1/11/17, 5:05 AM, "Thilo Goetz" <[email protected]> wrote:

    Hi,
    
    in a recent project, I was using SentenceDetectorME, TokenizerME and 
    POSTaggerME. It turns out that none of those is thread safe. This is 
    because the classification probabilities for the last tag() call (for 
    example) are stored in a member variable and can be retrieved by a 
    separate API call.
    
    I'm planning to build thread safe versions for myself, and I'd be happy 
    to contribute a patch if there is interest. This could be done as a 
    conservative extension with an additional method such as tagReentrant, 
    where the old API calls would continue to work as before and would still 
    not be thread safe. Alternatively, one could remodel the API so that 
    everything was thread safe, but that would break backwards compatibility.
    
    Final question: if I do this for the classes mentioned above, are there 
    other tools that should be made thread safe while we're at it?
    
    Opinions?
    
    --Thilo
    
    
    

Reply via email to