Hi Greg,

Check your log to see what component is taking all the time.

There is a known problem with the cleartk assertion annotators:

https://issues.apache.org/jira/browse/CTAKES-449

A partial fix was made in the "windowed" sub-package of ctakes-assertion: 
org.apache.ctakes.assertion.medfacts.cleartk.windowed.

Each of the normal assertion engines has a replacement in the windowed package.

If you are using a piper file that contains "load AttributeCleartkSubPipe" as 
the Default clinical pipeline does, just replace it with "load 
WindowedAttributeCleartkSubPipe".

It isn't a full fix for the problem, and I don't know if it will make your 
processing faster, but  you can give it a try.

Sean

________________________________________
From: Greg Silverman <g...@umn.edu>
Sent: Tuesday, September 24, 2019 6:47 PM
To: dev@ctakes.apache.org
Subject: Large files taking forever to process [EXTERNAL]

Any suggestions on how to speed up processing large clinical text notes
approaching 13K lines? This is a very old corpus culled from EPIC notes
back in 2009. I thought about splitting the notes into smaller chunks, but
then I would have to deal with the offsets when analyzing system output
against manual annotations that had been done.

As is, I've tried different garbage collection options (this seemed to have
worked well with CLAMP on the same set of notes).

TIA!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
 >
Department of Surgery
University of Minnesota
g...@umn.edu

 ›  evaluate-it.org  ‹

Reply via email to