Thanks Dan - this is very helpful.  We're going to start with a subset
(200K encounters w/ related flowsheets) to test the process and
performance.  More questions may follow as we make our way down the rabbit
hole.

Regards,

Justin Dale
Manager - Research Development and Support
University of Minnesota
Academic Health Center - Office of Information Systems
Phone: 612.624.9747
Fax: 612.624.7458
[email protected]

On Wed, Aug 5, 2015 at 11:10 AM, Dan Connolly <[email protected]> wrote:

> 1. yes, multiselect was a pain. the relevant HERON code is
> epic_flowsheets_multiselect.sql
> <https://informatics.kumc.edu/work/browser/heron_load/epic_flowsheets_multiselect.sql>
>
> We mentioned this in our AMIA 2011 paper
> <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243191/>:
>
> Measures documented as structured choices allowing multiple responses
> (e.g. “Dressing Status” is “clean dry and intact” and “reinforced”) are
> stored as a concatenated string that must be separated into individual
> observations within i2b2.
>
> 2. We started with discrete data only as well.
>
> We're now working on adding de-identified flowsheet notes to HERON
> utilizing Medical College of Wisconsin's ​Unstructured Notes
> Deidentification
> <https://bitbucket.org/MCW_BMI/unstructured-notes-deidentification/overview>
> methodology. Wish us luck...
>
> 3. Flowsheet observations dominate our i2b2 observation_fact table. 889M
> in the McMillan Marsh release
> <https://informatics.kumc.edu/work/blog/heron-mcmillan-marsh-update>. Our
> typical ETL timing breakdown
> <https://informatics.kumc.edu/work/wiki/HeronLoad#heron_load> shows
> epic_flowsheets takes 15+ hours (out of something like 40 or 60).
>
> Our latest release has 939M flowsheet facts; the main flowsheet ETL task
> took quite a bit more than 15 hrs:
>
> 2015-07-18 00:52:16   for     1 day, 11:34:00.379441  1         
> load_epic_flowsheets
>
> Query time performance is pretty good with ~1.5B facts, though we do use
> solid state storage and bitmapped indexes (i2b2_facts_index.sql
> <https://informatics.kumc.edu/work/browser/heron_load/i2b2_facts_index.sql>).
> My off-hand estimate is: 6 to 60seconds is typical, with more complex
> queries taking longer.
>
> We have a real-time "how busy is heron?" page that shows the run-time of
> recent queries as well as a performance regression suite that we run at
> release time and weekly. I could get details from those if it's important.
>
> --
> Dan
>
> ------------------------------
> *From:* [email protected] [[email protected]] on behalf of Justin Dale [
> [email protected]]
> *Sent:* Tuesday, August 04, 2015 10:10 PM
> *To:* Dan Connolly
> *Cc:* Tim Meyer
> *Subject:* flowsheets in i2b2
>
> Dan,
>
> Wondering if you had any words of wisdom to share for loading flowsheet
> data in i2b2?  We've drawn up plans for our environment but wanted to check
> with you to see if there were any gotcha moments or valuable
> lessons-learned.  Couple of the items we need to address:
> 1. custom list values for measure values: stored as semi-colon separated
> list in the Clarity table when multiple values have been selected.  we'll
> need to parse each of these values and load as individual facts
> 2. we're going to exclude any non-numeric and non custom list measure
> values because being able to run a query to just check if the fact exists
> or not does not seem valuable.
> 3. not sure what to expect from i2b2 performance once we add nearly 3
> billion more rows to the database.  considering starting with a subset of
> 500K to both test our i2b2 loader and performance.
>
> Appreciate any information you can provide.  Thanks Dan.
>
>
> Regards,
>
> Justin Dale
> Manager - Research Development and Support
> University of Minnesota
> Academic Health Center - Office of Information Systems
> Phone: 612.624.9747
> Fax: 612.624.7458
> [email protected]
>
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to