I very much like diagrams, but I don't have a lot of experience with UML and in I don't quite see how this UML diagram connects to our work... Could you do me a favor and replace generic labels such as "Class1" and "Class2" with something relevant to cancer/obesity/ALS?
And with respect to i2b2... I suppose these UML classes correspond to i2b2 concepts*, but I'd appreciate confirmation. I'm really struggling to read the soft-coded generalization UML model. Is it a junction table<http://en.wikipedia.org/wiki/Junction_table>? Or two of them? And the arrows in the hard-coded model... can I read those as owl subclass relationships? i.e. subset? About keeping in sync with upstream i2b2, my understanding is that the transitive closure table is used to build the normal concept paths that i2b2 uses; so it's just like any of the other techniques that we'd have to use to put things into i2b2's star schema. But on the other hand, I suppose Henderson did speak of query-time performance impact, so maybe I'm off base. I am yet to study his code. * I much prefer to call them terms<http://en.wikipedia.org/wiki/First-order_logic#Terms> and I wish i2b2 had as well. I'm convinced by Barry Smith's realist ontology writing<http://ontology.buffalo.edu/medo/reasoningBT.pdf> that "concepts" is muddy thinking, i.e. "International Standard Bad Philosophy." ________________________________ From: Greater Plains Collaborative Software Development [[email protected]] on behalf of Wanta Keith M [[email protected]] Sent: Thursday, February 13, 2014 1:07 PM To: [email protected] Subject: Re: Minutes of GPV-DEV call 20140213 - Keith's Action Item All, Attached, you will find an image I just sketched threw together with some UML screen shots I took as examples. It shows single and multiple inheritance (also referred to as Generalization) in its two most common UML design patterns. I was the technical reviewer of a book published earlier this year that discusses these patterns. Hard coded generalization and soft coded generalization (also referred to as a meta model) are two implementation strategies for generalization. Most common operational systems implement the hard coded generalization because with big data, this pattern performs the best with more attributes per entity. In an ontology, the best approach is to use soft coded generalization simply because it allows you to model anything and everything. Others pointed out the term transitive closure table. I don’t know the original reference of this term, but it’s identical to the soft coded generalization for multiple inheritance and is the way i2b2 should have been designed. Also, if i2b2 moved to this design pattern, the LIKE operator wouldn’t be necessary anymore in i2b2. If you do not know how to tweak performance, the LIKE operator perform better in a relational database, which is why they probably chose that pattern. One caveat to our conversation earlier during the GPC DEV meeting. UW-PCORI or WISC (UW Health / University of Wisconsin-Madison) has not used this approach for i2b2 because it deviates from standard i2b2 functionality. Rather than changing standard i2b2 source code (which is one possibility), I would much rather propose a new design to Partners Healthcare rather than changing its current design, otherwise it creates upgrade nightmares for everyone. The rule of thumb for software is that by introducing more frameworks (which i2b2 has many of), the upgrade defect risks increase exponentially. I2b2 uses 9+ frameworks (depending on how you implement things), so if we have 11 PCORI schools standardizing code that haven’t chosen a standard i2b2 version, this greatly concerns me. We don’t have multiple inheritance ontologies loaded in i2b2 because of the issues with synonyms and concept management. We have not chosen to implement soft coded generalization (aka transitive closure table) in i2b2 because it is not standard. If someone needs assistance with moving a file to this soft generalization design model (before they move the data into the i2b2 METADATA tables (or CONCEPT_DIMENSION), let me know. By playing with the indexes, you can make it perform better. The columns you absolutely need are parent and child, and in order to query the data, a recursive query is needed. Depth can be calculated. The discriminator gives the context of the generalization, and from my software experience, having the generalization as exhaustive is my preference. If you have others you need to add in the future, use a miscellaneous or other discriminator. Best Regards, Keith
