I've finished another pass through the reader that takes the SHARP Knowtator 
data and reads it into the cTAKES UIMA type system. The class is:

org.apache.ctakes.core.ae.SHARPKnowtatorXMLReader

If you take a look at that, you'll see a ton of TODO notes and warnings, where 
I couldn't figure out how to map the Knowtator annotations to the cTAKES UIMA 
annotations. Here's a list of issues:

* I couldn't find an entity type for "Clinical_attribute", "Devices", "Lab", 
"Phenomena"

* I couldn't find a modifier type (or alternatively, an Annotation subclass) 
for the Knowtator annotations "generic_class", "conditional_class", 
"uncertainty_indicator_class", "distal_or_proximal", "Person", 
"negation_indicator_class", "historyOf_indicator_class", 
"superior_or_inferior", "medial_or_lateral", "dorsal_or_ventral", 
"method_class", "device_class", "allergy_indicator_class", "Route", "Form", 
"Strength", "Strength number", "Strength unit", "Frequency", "Frequency 
number", "Frequency unit", "Value", "Value number", "Value unit", 
"estimated_flag_indicator", "reference_range", "Date", "Status change", 
"Duration", "Dosage".

* I couldn't find a place for the normalized value of "generic_class", 
"conditional_class", "uncertainty_indicator_class", "distal_or_proximal", 
"Person", "negation_indicator_class", "superior_or_inferior", 
"medial_or_lateral", "dorsal_or_ventral", "device_class", 
"allergy_indicator_class", "lab_interpretation_indicator", 
"estimated_flag_indicator"

* I couldn't find a place for the "associatedCode" of a "Person" or 
"historyOf_indicator_class"

* There were several things in the Knowtator annotations that I couldn't even 
guess what they meant: "Attributes_lab", "Temporal", ":THING", "Entities".

After working with this data I think we should consider having separate UIMA 
Annotation sub-types for each of the things that are Modifiers now. For 
example, if we have a real Severity Annotation for textual mentions of 
severity, then the CAS makes it easy to select these. We have exactly this use 
case in relation extractor - we need just the Severity modifiers, excluding all 
the other modifiers. Basically, I think the principle we should follow in UIMA 
is:

"If you could imagine searching the CAS for something, then that something 
should have it's own Annotation sub-type."

So, I think we need Annotation sub-types (not TOP sub-types) for:

// linguistic phenonmena
Generic
Conditional
Negation
Uncertainty
Estimated
HistoryOf
Person

// for disease/disorder/sign/symptom
Course
BodyLaterality (covering distal_or_proximal, superior_or_inferior, etc.)
BodySide

// for procedure
ProcedureMethod
ProcedureDevice

// for medication
MedicationAllergyIndicator
MedicationDosage
MedicationDuration
MedicationForm
MedicationFrequency
MedicationRoute
MedicationStartDate (maybe?)
MedicationStatusChange
MedicationStrength

// for lab
LabValue
LabInterpretation
LabReferenceRange

Steve

P.S. SHARPKnowtatorXMLReader can parse all the UMLS_CEM data that's on the 
cloud right now. So once all these type system issues get sorted out, it should 
be pretty much ready to go.

Reply via email to