I made a passing comment in one of the past gpc-dev weekly calls about UNMC
standing up an identified i2b2 on Postgresql. Dan asked if I would be willing
to spend part of an upcoming gpc-dev call to present what we did, and I will
happily do so.
I wrote scripts that move data from Oracle into Postgresql in a low-tech but
speedy approach. The gist of it is that :
1. Load metadata (and concept_dimension , provider_dimension, and the snomed
transitive closure table) from our deid instance (for now, once dust settles,
we'll have more detailed metadata for full zip codes and that sort of thing and
hence some metadata that will be different)
2. Move facts from our idwk instance ("oracle working spot for identified
data") into the new Postgresql instance.
3. Create indexes in Postgresql.
I think the process I did will work (with some polish) for sites that follow a
Heron-like model with prototype fact tables.(even if they don't use the heron
paver scripts and what not). Take those prototype fact tables/views and shovel
them into Postgresql.
It took about 24.5 hours for our 500,000,000-ish records in our system to load
into Postgresql. Step #3 took the longest time, which was a surprise to me.
It is two Python scripts that run; one for #1 and one for #2 and #3.
Performance wise, it seems to be a bit of a wash so far with Oracle and
Postgresql. Some things run better in one place or the other. I've not done
any tuning of the Postgresql database though - it is built with out of the box
defaults. Jim McClay has done more plinking around with real queries that I
have and can comment more about the performance aspect. We are very early on
in the looking at performance, but nothing we have seen so far would make me
change my mind about Postgresql being a quite usable i2b2 backend.
More details to follow!
Hubert Hickman
Systems Sr. Analyst
[email protected]<mailto:[email protected]>
(402) 559-4838
________________________________
Nebraska Medicine E-mail Confidentiality Disclaimer
The information in this e-mail may be privileged and confidential, intended
only for the use of the addressee(s) above. Any unauthorized use or disclosure
of this information is prohibited. If you have received this e-mail by mistake,
please delete it and immediately contact the sender.
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev