You're not needlessly worried. This is indeed a challenge of making data useful in i2b2.
The lesson I continually re-learn is: "Start by defining use cases, not ontologies" -- Building Ontologies Best practices, pitfalls and positives by Bodenreider 2009<http://mor.nlm.nih.gov/pubs/pres/20090504-CBO.pdf>. "By way of User Centred Design Knowledge Representation / ontologies was a solution, not a goal" -- Developing Biomedical Ontologies in OWL by Alan Rector<http://ontology.buffalo.edu/07/os3/Rector_OWL.pdf> The more you can anticipate how people will want to use the data (preferably based on experience), the more usable you can make it. It occurs to me that while flowsheet terminology is at the other end of the spectrum of standardization from SNOMED, the structure we found is also essentially a huge polyhierarchy: pulse (flow measure #8) shows up in many flowsheets. We had little to go on as far as what people would want, so we sort of gave them everything. In our i2b2 representation, it shows up under each of the flowsheets in which it occurs, and each of those flowsheets shows up under each department where it's used, and so on. The usability of the result is... well... you can imagine. We did some research on automated clustering to remove redundancy. * Expressing Observations from Electronic Medical Record Flowsheets in an i2b2-based Clinical Data Repository to Support Research and Quality Improvement<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243191/> L. Waitman, J. Warren, E. Manos, D. Connolly Abstract: While nursing documentation in electronic medical record (EMR) flowsheets may represent the largest investment of clinician time with information systems, organizations lack tools to visualize and repurpose this data for research and quality improvement. Incorporating flowsheet documentation into a clinical data repository and methods to reduce the flowsheet ontology's redundancy are described. 411 million flowsheet observations, derived from an EMR predominantly used in inpatient, outpatient oncology, and emergency room settings, were incorporated into a repository using the i2b2 framework. The local flowsheet ontology contained 720 "templates" employing 5,379 groups (2,678 distinct), 37,836 measures (13,659 distinct) containing 226,666 choices for a total size of 270,641. Aggressive pruning and clustering resulted in 150 templates, 743 groups (615 distinct), 6,950 measures (4,066 distinct) with 22,497 choices, and size of 30,371. Making nursing data accessible within i2b2 provides a new perspective for contributing clinical organizations and heightens collaboration between the academic and clinical activities. Ah... good... it looks like the complementary work, using an expert curation approach, was published too: * Ambient Findability: Developing a Flowsheet Ontology for i2B2<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3799091/> Judith J. Warren, PhD, RN, BC, FAAN, FACMI,1 E. LaVerne Manos, MS, RN, BC,1 Daniel W. Connolly,2 and Lemuel R. Waitman, PhD2 Abstract The process of moving from the locally defined flowsheet ontology containing redundancy and jargon to one understandable by researchers is described. Over 250 million nursing flowsheet observations were imported into a data repository that uses the i2b2 framework. Focus groups were used to derive a new ontology model--18 templates were identified. One hundred measures, 50% of all patient observations over 36 months, were encoded in SNOMED CT©. 78% of the concepts were mapped. related blog item: * AMIA 2011: Nursing Flowsheets data and the wild west of terminology<https://informatics.kumc.edu/work/blog/2011/10/slug> We're also struggling through "which end is up? or how many ends are up?" with microbiology. -- Dan ________________________________ From: Greater Plains Collaborative Software Development [[email protected]] on behalf of Campbell, James R [[email protected]] Sent: Tuesday, February 04, 2014 6:04 PM To: [email protected] Subject: Re: optional columns in i2b2 dimension tables RE: Minutes of GPV-DEV call 20140128 Researching the concept of 233607000|Pneumococcal pneumonia(disorder)| I find 8 distinct paths to the root due to the combinatorial possibilities of the polyhierarchy. I suspect that not all the paths are useful for data browsing or aggregation and I suspect a more parsimonious set would improve usability, or am I needlessly worried? Jim From: Greater Plains Collaborative Software Development [mailto:[email protected]] On Behalf Of Wanta Keith M Sent: Tuesday, February 04, 2014 1:42 PM To: [email protected] Subject: Re: optional columns in i2b2 dimension tables RE: Minutes of GPV-DEV call 20140128 Jim, that is correct about building multiple paths per concept, because of the multiple inheritance/multiple generalization. In a sense, you end up with multiple concepts (based on the number of parents) in the CONCEPT_DIMENSION. UW is also on 1.6. -Keith
