getContextMap() question

Tim Miller Tue, 12 Nov 2013 15:17:40 -0800

I'm running the default pipeline on some large files and trying to fixsome of the slower annotators. I changed ChunkAdjuster to use UimaFitselectors which dramatically improves speed on large files. I removedthe OverlapAnnotator, with its complicated interface and extremegenerality, from my pipeline altogether and replaced it with a 3-linestatic annotator. I think we should consider doing that for the defaultpipeline even if we think there are good reasons to keep thegeneral-purpose annotator around.

Anyways, now I'm at the dictionary lookup which I suspect will be theslowest component. One call is to getContextMap() which seems especiallyslow. It is called for every LookupWindow, and given the span of thatwindow, iterates over all LookupWindow's looking for one with theequivalent span. So in the end you give it a lookup window and it givesyou the same one back basically. Of course the code is written verygenerally so there may be use cases where the types are different, butfor the default case it seems a little weird for something doing nothingto take so long.

So, my question is, does anyone know what the engineering goals of thissetup are? I think it can be optimized even within the super-generalframework it is trying to maintain, but I don't want to break anythingby making assumptions that aren't valid.


Thanks
Tim

getContextMap() question

Reply via email to