Re: Minutes of GPV-DEV call 20140213 - Keith's Action Item

Dan Connolly Thu, 13 Feb 2014 16:25:16 -0800

I very much like diagrams, but I don't have a lot of experience with UML and in 
I don't quite see how this UML diagram connects to our work... Could you do me 
a favor and replace generic labels such as "Class1" and "Class2" with something 
relevant to cancer/obesity/ALS?


And with respect to i2b2... I suppose these UML classes correspond to i2b2 
concepts*, but I'd appreciate confirmation.

I'm really struggling to read the soft-coded generalization UML model. Is it a 
junction table<http://en.wikipedia.org/wiki/Junction_table>? Or two of them?

And the arrows in the hard-coded model... can I read those as owl subclass 
relationships? i.e. subset?

About keeping in sync with upstream i2b2, my understanding is that the 
transitive closure table is used to build the normal concept paths that i2b2 
uses; so it's just like any of the other techniques that we'd have to use to 
put things into i2b2's star schema. But on the other hand, I suppose Henderson 
did speak of query-time performance impact, so maybe I'm off base. I am yet to 
study his code.

* I much prefer to call them 
terms<http://en.wikipedia.org/wiki/First-order_logic#Terms> and I wish i2b2 had 
as well. I'm convinced by Barry Smith's realist ontology 
writing<http://ontology.buffalo.edu/medo/reasoningBT.pdf> that "concepts" is 
muddy thinking, i.e. "International Standard Bad Philosophy."

________________________________
From: Greater Plains Collaborative Software Development 
[[email protected]] on behalf of Wanta Keith M [[email protected]]
Sent: Thursday, February 13, 2014 1:07 PM
To: [email protected]
Subject: Re: Minutes of GPV-DEV call 20140213 - Keith's Action Item

All,

Attached, you will find an image I just sketched threw together with some UML 
screen shots I took as examples.  It shows single and multiple inheritance 
(also referred to as Generalization) in its two most common UML design 
patterns.  I was the technical reviewer of a book published earlier this year 
that discusses these patterns.  Hard coded generalization and soft coded 
generalization (also referred to as a meta model) are two implementation 
strategies for generalization.  Most common operational systems implement the 
hard coded generalization because with big data, this pattern performs the best 
with more attributes per entity.  In an ontology, the best approach is to use 
soft coded generalization simply because it allows you to model anything and 
everything.

Others pointed out the term transitive closure table.  I don’t know the 
original reference of this term, but it’s identical to the soft coded 
generalization for multiple inheritance and is the way i2b2 should have been 
designed.  Also, if i2b2 moved to this design pattern, the LIKE operator 
wouldn’t be necessary anymore in i2b2.  If you do not know how to tweak 
performance, the LIKE operator perform better in a relational database, which 
is why they probably chose that pattern.

One caveat to our conversation earlier during the GPC DEV meeting.  UW-PCORI or 
WISC (UW Health / University of Wisconsin-Madison) has not used this approach 
for i2b2 because it deviates from standard i2b2 functionality.  Rather than 
changing standard i2b2 source code (which is one possibility), I would much 
rather propose a new design to Partners Healthcare rather than changing its 
current design, otherwise it creates upgrade nightmares for everyone.  The rule 
of thumb for software is that by introducing more frameworks (which i2b2 has 
many of), the upgrade defect risks increase exponentially.  I2b2 uses 9+ 
frameworks (depending on how you implement things), so if we have 11 PCORI 
schools standardizing code that haven’t chosen a standard i2b2 version, this 
greatly concerns me.  We don’t have multiple inheritance ontologies loaded in 
i2b2 because of the issues with synonyms and concept management.  We have not 
chosen to implement soft coded generalization (aka transitive closure table) in 
i2b2 because it is not standard.

If someone needs assistance with moving a file to this soft generalization 
design model (before they move the data into the i2b2 METADATA tables (or 
CONCEPT_DIMENSION), let me know.  By playing with the indexes, you can make it 
perform better.  The columns you absolutely need are parent and child, and in 
order to query the data, a recursive query is needed.  Depth can be calculated. 
 The discriminator gives the context of the generalization, and from my 
software experience, having the generalization as exhaustive is my preference.  
If you have others you need to add in the future, use a miscellaneous or other 
discriminator.

Best Regards,
Keith

Re: Minutes of GPV-DEV call 20140213 - Keith's Action Item

Reply via email to