1. yes, multiselect was a pain. the relevant HERON code is epic_flowsheets_multiselect.sql<https://informatics.kumc.edu/work/browser/heron_load/epic_flowsheets_multiselect.sql>
We mentioned this in our AMIA 2011 paper<http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243191/>: Measures documented as structured choices allowing multiple responses (e.g. “Dressing Status” is “clean dry and intact” and “reinforced”) are stored as a concatenated string that must be separated into individual observations within i2b2. 2. We started with discrete data only as well. We're now working on adding de-identified flowsheet notes to HERON utilizing Medical College of Wisconsin's Unstructured Notes Deidentification<https://bitbucket.org/MCW_BMI/unstructured-notes-deidentification/overview> methodology. Wish us luck... 3. Flowsheet observations dominate our i2b2 observation_fact table. 889M in the McMillan Marsh release<https://informatics.kumc.edu/work/blog/heron-mcmillan-marsh-update>. Our typical ETL timing breakdown<https://informatics.kumc.edu/work/wiki/HeronLoad#heron_load> shows epic_flowsheets takes 15+ hours (out of something like 40 or 60). Our latest release has 939M flowsheet facts; the main flowsheet ETL task took quite a bit more than 15 hrs: 2015-07-18 00:52:16 for 1 day, 11:34:00.379441 1 load_epic_flowsheets Query time performance is pretty good with ~1.5B facts, though we do use solid state storage and bitmapped indexes (i2b2_facts_index.sql<https://informatics.kumc.edu/work/browser/heron_load/i2b2_facts_index.sql>). My off-hand estimate is: 6 to 60seconds is typical, with more complex queries taking longer. We have a real-time "how busy is heron?" page that shows the run-time of recent queries as well as a performance regression suite that we run at release time and weekly. I could get details from those if it's important. -- Dan ________________________________ From: [email protected] [[email protected]] on behalf of Justin Dale [[email protected]] Sent: Tuesday, August 04, 2015 10:10 PM To: Dan Connolly Cc: Tim Meyer Subject: flowsheets in i2b2 Dan, Wondering if you had any words of wisdom to share for loading flowsheet data in i2b2? We've drawn up plans for our environment but wanted to check with you to see if there were any gotcha moments or valuable lessons-learned. Couple of the items we need to address: 1. custom list values for measure values: stored as semi-colon separated list in the Clarity table when multiple values have been selected. we'll need to parse each of these values and load as individual facts 2. we're going to exclude any non-numeric and non custom list measure values because being able to run a query to just check if the fact exists or not does not seem valuable. 3. not sure what to expect from i2b2 performance once we add nearly 3 billion more rows to the database. considering starting with a subset of 500K to both test our i2b2 loader and performance. Appreciate any information you can provide. Thanks Dan. Regards, Justin Dale Manager - Research Development and Support University of Minnesota Academic Health Center - Office of Information Systems Phone: 612.624.9747 Fax: 612.624.7458 [email protected]<mailto:[email protected]>
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
