I don’t see a source_master table here, so I’ll have to check that with Keith 
when he gets back.  I was able to finesse it for the short term, anyway.

I did succeed in finding and replicating the “normal_concept”  view also 
referenced in naaccr_concepts_load.

Thanks for the feedback.  I’ll try to spend a bit more time on multisitedev.  I 
should mention that the link for “internal onboarding/training notes” doesn’t 
work.  I think we discussed this earlier.  
https://bmi-work.kumc.edu/work/ticket/1247#comment:16


From: Dan Connolly [mailto:[email protected]]
Sent: Thursday, January 29, 2015 2:24 PM
To: Lenon Patrick; [email protected]
Subject: RE: code sharing an HERON ETL documentation (was: HackathonTwo 
follow-up action items)

I don't doubt that there are lots of undocumented aspects of HERON ETL, but the 
issue here seems more like the fact that 20,000 lines of code is a lot to get 
your head around, especially when the number of contributors has been in the 
single digits for a long time, so we haven't put a lot of emphasis on the new 
developer experience. 
MultiSiteDev<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev> 
is an attempt to address this issue. I wouldn't mind devoting more meeting time 
to this sort of thing.

In particular:

source_master is part of i2b2:

edu.harvard.i2b2.data> grep -ri source_master .
./Release_1-7/NewInstall/Crcdata/scripts/crc_create_uploader_oracle.sql:-- 
Table: SOURCE_MASTER
./Release_1-7/NewInstall/Crcdata/scripts/crc_create_uploader_oracle.sql:CREATE 
TABLE SOURCE_MASTER (
...

So I'd be surprised if you really don't have such a table.

The fact that HERON ETL relies on such tables is documented in the module 
header of 
heron_create.py<https://informatics.kumc.edu/work/browser/heron_load/heron_create.py>:
Database initialization scripts from i2b2 sources are used for this process::

   >>> options = _option  # un-hide for testing
   >>> options.i2b2_source
   'mock_i2b2_source'
The relevant task* is create_deid_schema starting on line 
193<https://informatics.kumc.edu/work/browser/heron_load/heron_create.py#L193>:
@task
def create_deid_schemas(options):
    '''Create schemas for de-identified datamart.

    Note Well: Any existing schema is destroyed.
    .. todo:: consider checking  that there's no valuable data
It calls _create_datamart, where we see crc_create_uploader_oracle.sql 
mentioned by name.


* Paver tasks and 
dependencies<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev#Tasksanddependencies>
 are discussed in 
MultiSiteDev<https://informatics.gpcnetwork.org/trac/Project/wiki/MultiSiteDev>.

--
Dan
________________________________
From: Lenon Patrick [[email protected]]
Sent: Thursday, January 29, 2015 1:44 PM
To: Dan Connolly; [email protected]<mailto:[email protected]>
Subject: RE: HackathonTwo follow-up action items
Regarding the #44 Portable Heron notes, one situation I’ve dealt with is 
references to a table which I do not have.  From naaccr_concepts_load.sql, for 
instance:

(select * from 
BlueHeronData.source_master@deid<mailto:BlueHeronData.source_master@deid>
   where source_cd like 'tumor_registry@%')

I don’t have a “source_master” table, and can’t begin to guess what “*” might 
comprise.  (OK, I know one field is called “source_cd”.) In other cases I’ve 
been able to infer what’s missing, or what our local name for a given table is, 
but this one stopped me cold.

For a database script to be portable, there need to be definitions somewhere of 
the tables and fields referenced.  This could be in the form of a data 
dictionary, or it could be simply output of data definition language (DDL) from 
the tables referenced.  Oftimes the first script run does all the DDL up front, 
with data manipulation language (DML) in a separate script that can be run 
iteratively.

Apologies if I missed something obvious.  Peril of being the “new kid”…

p.s. If I did miss it,kindly point it out using small words and simple 
gestures.  ;)



From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Dan Connolly
Sent: Tuesday, January 27, 2015 9:46 AM
To: [email protected]<mailto:[email protected]>
Subject: HackathonTwo follow-up action items

I'm going over the meeting 
notes<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonTwo#record> 
(thanks, Laurel!) looking especially for follow-up items. I added comments that 
some of them aren't quite clear (as well as presentation materials to find, 
etc.)

This is a summary of follow-up action items that I found so far:

Names in section headings indicate follow-up item ownership.



Day 1 - Thu, Jan 
22<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.w3yrm6qir16q>

Introductions, Opening Remarks - GPC Phase 2 
LOI<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.46n62mizfpxm>

AM Session 1 - GPC CDM ETL (Campbell, 
Graham?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.1ahl9lh7w5ei>

AM Session 1 - Heron Code 
Sharing<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.zdilnib5ekxc>

identified i2b2 
(Mosa@MU)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.523suzwa3zeu>

AM Session 2 - Breast Cancer Survey Finder FIle (Kowalski, UMN, UIOWA, MCRF, 
UTSW, 
McMahon)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.nf8258clqxww>

AM Session 2 - Obesity, BMI percentile 
(?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.okvaig1ygap4>

AM Session 2 - Terminology Mapping 
Strategies<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.c9scujdqyl8e>

PM Session 1 - Terminologies (Reeder, Campbell, 
Connolly)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.bl4ohqy5t9xr>

PM Session 2 - Data Quality 
(?)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.v6pigqrdquys>

PM Session 2 - 
Encounters<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.qhq7fmqwj9xi>

PM Session 2 - Text Deidentification (Jacquie @ 
MCRF)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.mhfn3n5l9hx>

DAY 
2<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.4te681ghpj2u>

AM Session 1 - Usable LOINC Lab Hierarchy - 
(Apathy)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.5yzy9wlfcb19>

AM Session 1 - NLP/Text Notes Code 
Sharing<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.ydcola9j7ok>

AM Session 2 - Federated login 
(Mish)<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.wnkejbhu1tmr>

AM Session 2 - Building Analytic 
Datasets<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.u7ngvpz55k2i>

Using heron_extract to reshape data for use in 
REDCap<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.nvkllaqj6uky>

Analyzable 
Data<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.tfawvpu2vbot>

Data Analyzer User 
Interface<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.dpnvwd4jzxgm>

PM Session 1 - EMR 
Integration<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.xy7jgds0qjnw>
PM Session 
2<https://docs.google.com/document/d/13dA_ml1GSIhZ7fs-fWle5dPU95UlFQEoS6HJezSgKyQ/edit#heading=h.q0osz6mvtv3c>
--
Dan
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to