I don’t see a source_master table here, so I’ll have to check that with Keith when he gets back. I was able to finesse it for the short term, anyway.
I did succeed in finding and replicating the “normal_concept” view also referenced in naaccr_concepts_load. Thanks for the feedback. I’ll try to spend a bit more time on multisitedev. I should mention that the link for “internal onboarding/training notes” doesn’t work. I think we discussed this earlier. https://bmi-work.kumc.edu/work/ticket/1247#comment:16 From: Dan Connolly [mailto:[email protected]] Sent: Thursday, January 29, 2015 2:24 PM To: Lenon Patrick; [email protected] Subject: RE: code sharing an HERON ETL documentation (was: HackathonTwo follow-up action items) I don't doubt that there are lots of undocumented aspects of HERON ETL, but the issue here seems more like the fact that 20,000 lines of code is a lot to get your head around, especially when the number of contributors has been in the single digits for a long time, so we haven't put a lot of emphasis on the new developer experience. MultiSiteDev<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev> is an attempt to address this issue. I wouldn't mind devoting more meeting time to this sort of thing. In particular: source_master is part of i2b2: edu.harvard.i2b2.data> grep -ri source_master . ./Release_1-7/NewInstall/Crcdata/scripts/crc_create_uploader_oracle.sql:-- Table: SOURCE_MASTER ./Release_1-7/NewInstall/Crcdata/scripts/crc_create_uploader_oracle.sql:CREATE TABLE SOURCE_MASTER ( ... So I'd be surprised if you really don't have such a table. The fact that HERON ETL relies on such tables is documented in the module header of heron_create.py<https://informatics.kumc.edu/work/browser/heron_load/heron_create.py>: Database initialization scripts from i2b2 sources are used for this process:: >>> options = _option # un-hide for testing >>> options.i2b2_source 'mock_i2b2_source' The relevant task* is create_deid_schema starting on line 193<https://informatics.kumc.edu/work/browser/heron_load/heron_create.py#L193>: @task def create_deid_schemas(options): '''Create schemas for de-identified datamart. Note Well: Any existing schema is destroyed. .. todo:: consider checking that there's no valuable data It calls _create_datamart, where we see crc_create_uploader_oracle.sql mentioned by name. * Paver tasks and dependencies<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev#Tasksanddependencies> are discussed in MultiSiteDev<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev>. -- Dan ________________________________ From: Lenon Patrick [[email protected]] Sent: Thursday, January 29, 2015 1:44 PM To: Dan Connolly; [email protected]<mailto:[email protected]> Subject: RE: HackathonTwo follow-up action items Regarding the #44 Portable Heron notes, one situation I’ve dealt with is references to a table which I do not have. From naaccr_concepts_load.sql, for instance: (select * from BlueHeronData.source_master@deid<mailto:BlueHeronData.source_master@deid> where source_cd like 'tumor_registry@%') I don’t have a “source_master” table, and can’t begin to guess what “*” might comprise. (OK, I know one field is called “source_cd”.) In other cases I’ve been able to infer what’s missing, or what our local name for a given table is, but this one stopped me cold. For a database script to be portable, there need to be definitions somewhere of the tables and fields referenced. This could be in the form of a data dictionary, or it could be simply output of data definition language (DDL) from the tables referenced. Oftimes the first script run does all the DDL up front, with data manipulation language (DML) in a separate script that can be run iteratively. Apologies if I missed something obvious. Peril of being the “new kid”… p.s. If I did miss it,kindly point it out using small words and simple gestures. ;) From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Dan Connolly Sent: Tuesday, January 27, 2015 9:46 AM To: [email protected]<mailto:[email protected]> Subject: HackathonTwo follow-up action items I'm going over the meeting notes<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonTwo#record> (thanks, Laurel!) looking especially for follow-up items. I added comments that some of them aren't quite clear (as well as presentation materials to find, etc.) This is a summary of follow-up action items that I found so far: Names in section headings indicate follow-up item ownership. Day 1 - Thu, Jan 22<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.w3yrm6qir16q> Introductions, Opening Remarks - GPC Phase 2 LOI<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.46n62mizfpxm> AM Session 1 - GPC CDM ETL (Campbell, Graham?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.1ahl9lh7w5ei> AM Session 1 - Heron Code Sharing<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.zdilnib5ekxc> identified i2b2 (Mosa@MU)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.523suzwa3zeu> AM Session 2 - Breast Cancer Survey Finder FIle (Kowalski, UMN, UIOWA, MCRF, UTSW, McMahon)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.nf8258clqxww> AM Session 2 - Obesity, BMI percentile (?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.okvaig1ygap4> AM Session 2 - Terminology Mapping Strategies<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.c9scujdqyl8e> PM Session 1 - Terminologies (Reeder, Campbell, Connolly)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.bl4ohqy5t9xr> PM Session 2 - Data Quality (?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.v6pigqrdquys> PM Session 2 - Encounters<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.qhq7fmqwj9xi> PM Session 2 - Text Deidentification (Jacquie @ MCRF)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.mhfn3n5l9hx> DAY 2<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.4te681ghpj2u> AM Session 1 - Usable LOINC Lab Hierarchy - (Apathy)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.5yzy9wlfcb19> AM Session 1 - NLP/Text Notes Code Sharing<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.ydcola9j7ok> AM Session 2 - Federated login (Mish)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.wnkejbhu1tmr> AM Session 2 - Building Analytic Datasets<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.u7ngvpz55k2i> Using heron_extract to reshape data for use in REDCap<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.nvkllaqj6uky> Analyzable Data<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.tfawvpu2vbot> Data Analyzer User Interface<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.dpnvwd4jzxgm> PM Session 1 - EMR Integration<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.xy7jgds0qjnw> PM Session 2<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.q0osz6mvtv3c> -- Dan
_______________________________________________ Gpc-dev mailing list [email protected] http://listserv.kumc.edu/mailman/listinfo/gpc-dev
