Hi Sean, I just ran another set of notes through cTAKES and noticed the following error:
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p %c{1} - %m%n]. log4j: Adding appender named [consoleAppender] to category [root]. 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found: WindowedAttributeCleartkSubPipe Is something missing? This is how my DefaultFastPipeline.piper file looks (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with similar results) // Commands and parameters to create a default plaintext document processing pipeline with UMLS lookup // Load a simple token processing pipeline from another pipeline file load DefaultTokenizerPipeline.piper // Add non-core annotators add ContextDependentTokenizerAnnotator addDescription POSTagger // Add Chunkers load ChunkerSubPipe.piper // Default fast dictionary lookup add DefaultJCasTermAnnotator // Add Cleartk Entity Attribute annotators // see https://issues.apache.org/jira/browse/CTAKES-449 //load AttributeCleartkSubPipe.piper load WindowedAttributeCleartkSubPipe All files seem to have been processed fine, but wondering if something was missed, due to the error. If so, how do I construct the WindowedAttributeCleartkSubPipe.piper file? Thanks very much in advance! Greg-- On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <g...@umn.edu> wrote: > Sweet! That was definitely it! It's flying now (granted, our files are not > in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm, but > still!). > > Mahalo nui loa! > > > > On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > >> Hi Greg, >> >> Check your log to see what component is taking all the time. >> >> There is a known problem with the cleartk assertion annotators: >> >> https://issues.apache.org/jira/browse/CTAKES-449 >> >> A partial fix was made in the "windowed" sub-package of ctakes-assertion: >> org.apache.ctakes.assertion.medfacts.cleartk.windowed. >> >> Each of the normal assertion engines has a replacement in the windowed >> package. >> >> If you are using a piper file that contains "load >> AttributeCleartkSubPipe" as the Default clinical pipeline does, just >> replace it with "load WindowedAttributeCleartkSubPipe". >> >> It isn't a full fix for the problem, and I don't know if it will make >> your processing faster, but you can give it a try. >> >> Sean >> >> ________________________________________ >> From: Greg Silverman <g...@umn.edu> >> Sent: Tuesday, September 24, 2019 6:47 PM >> To: dev@ctakes.apache.org >> Subject: Large files taking forever to process [EXTERNAL] >> >> Any suggestions on how to speed up processing large clinical text notes >> approaching 13K lines? This is a very old corpus culled from EPIC notes >> back in 2009. I thought about splitting the notes into smaller chunks, but >> then I would have to deal with the offsets when analyzing system output >> against manual annotations that had been done. >> >> As is, I've tried different garbage collection options (this seemed to >> have >> worked well with CLAMP on the same set of notes). >> >> TIA! >> >> Greg-- >> >> -- >> Greg M. Silverman >> Senior Systems Developer >> NLP/IE < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e= >> > >> Department of Surgery >> University of Minnesota >> g...@umn.edu >> >> › evaluate-it.org ‹ >> > > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > Department of Surgery > University of Minnesota > g...@umn.edu > > › evaluate-it.org ‹ > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu › evaluate-it.org ‹