krickert commented on PR #1003:
URL: https://github.com/apache/opennlp/pull/1003#issuecomment-4157829114

   Summary since first review:
   
   Made all 7 ME classes thread-safe by eliminating shared mutable instance 
state. Deprecate the `ThreadSafe*ME` wrappers - users can now share ME 
instances directly.
   
   ## Motivation
   
   ME classes were documented as not thread-safe due to mutable instance fields 
that corrupt under concurrent access. The workarounds were creating a new ME 
instance per call (expensive) or using `ThreadSafe*ME` wrappers 
(ThreadLocal-based, leak-prone in Jakarta EE). This PR makes the ME classes 
themselves thread-safe, yielding a **2.52x throughput improvement** for 
POSTagger (JMH, 32 threads) by enabling instance reuse.
   
   ## Approach
   
   Mutable state moved to method-local variables or per-thread caches 
(ThreadLocal) at every layer:
   
   | Layer                               | Change                               
                                                                                
            |
   | ----------------------------------- | 
--------------------------------------------------------------------------------------------------------------------------------
 |
   | **ME classes** (all 7)              | Result fields (`bestSequence`, 
`tokProbs`, etc.) made `volatile`; processing uses method-local variables with 
atomic swap at end |
   | **BeamSearch**                      | `probs[]` buffer and `contextsCache` 
moved to per-thread `ThreadLocal` state                                         
            |
   | **CachedFeatureGenerator**          | Cache moved to per-thread 
`ThreadLocal` (JMH confirms 1.62x benefit)                                      
                       |
   | **ConfigurablePOSContextGenerator** | Cache moved to per-thread 
`ThreadLocal`                                                                   
                       |
   | **DefaultSDContextGenerator**       | `buf`/`collectFeats` moved to 
method-local parameters                                                         
                   |
   
   ### Files changed (30 total)
   
   **Source (13 files):** TokenizerME, SentenceDetectorME, POSTaggerME, 
LemmatizerME, ChunkerME, NameFinderME, LanguageDetectorME, BeamSearch, 
CachedFeatureGenerator, ConfigurablePOSContextGenerator, 
DefaultPOSContextGenerator, DefaultSDContextGenerator, SentenceContextGenerator 
(Thai)
   
   **Deprecated (7 files):** ThreadSafeTokenizerME, 
ThreadSafeSentenceDetectorME, ThreadSafePOSTaggerME, ThreadSafeLemmatizerME, 
ThreadSafeChunkerME, ThreadSafeNameFinderME, ThreadSafeLanguageDetectorME
   
   **Internal usage swaps (3 files):** Muc6NameSampleStreamFactory, 
TwentyNewsgroupSampleStreamFactory, POSTaggerMEIT - replaced `ThreadSafe*ME` 
with direct ME usage
   
   **Tests/benchmarks (5 files):** ThreadSafetyBenchmarkTest (8 JUnit tests), 3 
JMH benchmarks, CachedFeatureGeneratorTest update
   
   **Build (1 file):** pom.xml - fixed JMH annotation processor wiring
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to