FYI, Alex and I will be at PyData Silicon Valley 2014<http://pydata.org/sv2014> talking about HERON ETL: <http://pydata.org/sv2014/abstracts/#193_> Using Python and Paver to Control a Large Medical Informatics ETL Process May 03 - 12:40 p.m. Alex F. Bokov <http://pydata.org/sv2014/speakers/#271> , Dan Connolly<http://pydata.org/sv2014/speakers/#275> The The Greater Plains Collaborative (GPC) is a new network of 10 leading medical centers in 7 states working to improve healthcare delivery and advance research by mining electronic medical records and patient registries. To do this, we must de-identify and securely migrate patient data from heterogeneous formats (e.g. Clarity, IDX, NAACCR) to our data warehouse platform (HERON) which is built on top of I2B2. Task dependencies in the complex network of python scripts that wrap our SQL code is managed via paver, permitting a robust, modular, and maintainable architecture. In the process, we developed new python tools for generating dependency graphs from SQL code and for integrating R and RedCap into our analytical pipeline. Moreover, by adapting our python code to work across multiple member institutions we have started moving toward a generic workflow for building, testing, documenting, and deploying medical informatics research data warehouses.
-- Dan
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
