Sean's fix did the magic. Thanks for the suggestion though. I'm wondering how this would work with our custom implementation of MetaMap with UIMA-AS (it is SLOW as molasses)
Best! Greg-- On Tue, Sep 24, 2019 at 7:18 PM Petersam, John Contractor < john.peter...@ssa.gov> wrote: > Hi Greg, > We regularly process documents that are over 5000 pages (not lines). What > we've found is that many of the annotators within the standard distribution > operate at o(n^2). The standard dependency parser is one example among > many. > > The good news is that you can achieve linear results if you convert these > classes to use TreeMaps. We actually build the tree maps one time and > cache them in ThreadLocal variables which allows us to process multiple > threads simultaneously. > > Hope this helps, > John > > -----Original Message----- > From: Greg Silverman <g...@umn.edu> > Sent: Tuesday, September 24, 2019 6:47 PM > To: dev@ctakes.apache.org > Subject: [EXTERNAL] Large files taking forever to process > > Any suggestions on how to speed up processing large clinical text notes > approaching 13K lines? This is a very old corpus culled from EPIC notes > back in 2009. I thought about splitting the notes into smaller chunks, but > then I would have to deal with the offsets when analyzing system output > against manual annotations that had been done. > > As is, I've tried different garbage collection options (this seemed to > have worked well with CLAMP on the same set of notes). > > TIA! > > Greg-- > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > Department of Surgery > University of Minnesota > g...@umn.edu > > › evaluate-it.org ‹ > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu › evaluate-it.org ‹