FYI, Alex and I will be at PyData Silicon Valley 2014<http://pydata.org/sv2014> 
talking about HERON ETL:
<http://pydata.org/sv2014/abstracts/#193_>
Using Python and Paver to Control a Large Medical Informatics ETL Process
May 03 - 12:40 p.m.
Alex F. Bokov <http://pydata.org/sv2014/speakers/#271> , Dan 
Connolly<http://pydata.org/sv2014/speakers/#275>
The The Greater Plains Collaborative (GPC) is a new network of 10 leading 
medical centers in 7 states working to improve healthcare delivery and advance 
research by mining electronic medical records and patient registries. To do 
this, we must de-identify and securely migrate patient data from heterogeneous 
formats (e.g. Clarity, IDX, NAACCR) to our data warehouse platform (HERON) 
which is built on top of I2B2. Task dependencies in the complex network of 
python scripts that wrap our SQL code is managed via paver, permitting a 
robust, modular, and maintainable architecture. In the process, we developed 
new python tools for generating dependency graphs from SQL code and for 
integrating R and RedCap into our analytical pipeline. Moreover, by adapting 
our python code to work across multiple member institutions we have started 
moving toward a generic workflow for building, testing, documenting, and 
deploying medical informatics research data warehouses.

--
Dan

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to