Tim an Co: Regarding CTAKES-174 - use sections for coreference resolution. I think sections should be normalized to a standard code (HL7 section template id, LOINC, etc.); otherwise the trained models won't be portable to different section header names. I spent a few hours last night on the mappings and simple code is in the sandbox area for a RegEx cTAKES Sectionizer[1] and a User configurable mapping file [2] that contains the HL7 section template id's and their names. [1] https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-sectionizer/ [2] https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-sectionizer/src/main/resources/org/apache/ctakes/core/sections/ccda_sections.txt
If it works, I think we should replace the current core/sectionizer... [The RegEx still needs a bit of work to identify the sections, but I hope that can be fixed.] --Pei
