RE: flowsheets in i2b2

Dan Connolly Wed, 05 Aug 2015 09:11:03 -0700

1. yes, multiselect was a pain. the relevant HERON code is 
epic_flowsheets_multiselect.sql<https://informatics.kumc.edu/work/browser/heron_load/epic_flowsheets_multiselect.sql>


We mentioned this in our AMIA 2011 
paper<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243191/>:
Measures documented as structured choices allowing multiple responses (e.g. 
“Dressing Status” is “clean dry and intact” and “reinforced”) are stored as a 
concatenated string that must be separated into individual observations within 
i2b2.
2. We started with discrete data only as well.

We're now working on adding de-identified flowsheet notes to HERON utilizing 
Medical College of Wisconsin's Unstructured Notes 
Deidentification<https://bitbucket.org/MCW_BMI/unstructured-notes-deidentification/overview>
 methodology. Wish us luck...

3. Flowsheet observations dominate our i2b2 observation_fact table. 889M in the 
McMillan Marsh 
release<https://informatics.kumc.edu/work/blog/heron-mcmillan-marsh-update>. 
Our typical ETL timing 
breakdown<https://informatics.kumc.edu/work/wiki/HeronLoad#heron_load> shows 
epic_flowsheets takes 15+ hours (out of something like 40 or 60).

Our latest release has 939M flowsheet facts; the main flowsheet ETL task took 
quite a bit more than 15 hrs:

2015-07-18 00:52:16     for     1 day, 11:34:00.379441  1         
load_epic_flowsheets


Query time performance is pretty good with ~1.5B facts, though we do use solid 
state storage and bitmapped indexes 
(i2b2_facts_index.sql<https://informatics.kumc.edu/work/browser/heron_load/i2b2_facts_index.sql>).
 My off-hand estimate is: 6 to 60seconds is typical, with more complex queries 
taking longer.

We have a real-time "how busy is heron?" page that shows the run-time of recent 
queries as well as a performance regression suite that we run at release time 
and weekly. I could get details from those if it's important.

--
Dan

________________________________
From: [email protected] [[email protected]] on behalf of Justin Dale 
[[email protected]]
Sent: Tuesday, August 04, 2015 10:10 PM
To: Dan Connolly
Cc: Tim Meyer
Subject: flowsheets in i2b2

Dan,

Wondering if you had any words of wisdom to share for loading flowsheet data in 
i2b2?  We've drawn up plans for our environment but wanted to check with you to 
see if there were any gotcha moments or valuable lessons-learned.  Couple of 
the items we need to address:
1. custom list values for measure values: stored as semi-colon separated list 
in the Clarity table when multiple values have been selected.  we'll need to 
parse each of these values and load as individual facts
2. we're going to exclude any non-numeric and non custom list measure values 
because being able to run a query to just check if the fact exists or not does 
not seem valuable.
3. not sure what to expect from i2b2 performance once we add nearly 3 billion 
more rows to the database.  considering starting with a subset of 500K to both 
test our i2b2 loader and performance.

Appreciate any information you can provide.  Thanks Dan.


Regards,

Justin Dale
Manager - Research Development and Support
University of Minnesota
Academic Health Center - Office of Information Systems
Phone: 612.624.9747
Fax: 612.624.7458
[email protected]<mailto:[email protected]>

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: flowsheets in i2b2

Reply via email to