Hi Greg, Check your log to see what component is taking all the time.
There is a known problem with the cleartk assertion annotators: https://issues.apache.org/jira/browse/CTAKES-449 A partial fix was made in the "windowed" sub-package of ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed. Each of the normal assertion engines has a replacement in the windowed package. If you are using a piper file that contains "load AttributeCleartkSubPipe" as the Default clinical pipeline does, just replace it with "load WindowedAttributeCleartkSubPipe". It isn't a full fix for the problem, and I don't know if it will make your processing faster, but you can give it a try. Sean ________________________________________ From: Greg Silverman <g...@umn.edu> Sent: Tuesday, September 24, 2019 6:47 PM To: dev@ctakes.apache.org Subject: Large files taking forever to process [EXTERNAL] Any suggestions on how to speed up processing large clinical text notes approaching 13K lines? This is a very old corpus culled from EPIC notes back in 2009. I thought about splitting the notes into smaller chunks, but then I would have to deal with the offsets when analyzing system output against manual annotations that had been done. As is, I've tried different garbage collection options (this seemed to have worked well with CLAMP on the same set of notes). TIA! Greg-- -- Greg M. Silverman Senior Systems Developer NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e= > Department of Surgery University of Minnesota g...@umn.edu › evaluate-it.org ‹