#145: transform (ETL) GPC i2b2 data to PCORNet CDM
--------------------------+----------------------------
Reporter: dconnolly | Owner: ngraham
Type: enhancement | Status: assigned
Priority: major | Milestone: data-domains2
Component: data-sharing | Resolution:
Keywords: | Blocked By: 109
Blocking: 160 |
--------------------------+----------------------------
Comment (by ngraham):
Thanks to Alex at UTHSCSA (via Dan) and Dan for pointing us to the fact
that the `c_dimcode` is what's used to by i2b2 to find the relevant
concept codes from the concept dimension and can be different from the
`c_fullname` which defines what we see in the concept hierarchy.
Given that pointer, I was able to run a query at KU using the i2b2 PCORI
hierarchy ([https://bitbucket.org/DanC/pcornet-dm pconet-dm]) that
returned patient counts based on our local terminology. Of course, we
want to use GPC terminology eventually.
I then used Dan's [wiki:PortQuery Port Query plugin] to run the query on
Babel (which I then put in the shared folder):
`SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014]
[[email protected]]`.
==== Details ====
Load the PCORI terminlology that Dan built ([https://bitbucket.org/DanC
/pcornet-dm pconet-dm]) on the KUMC i2b2 test instance.
On the test instance:
{{{
create table blueheronmetadata.pcori_terms as
select * from blueheronmetadata.heron_terms where 1=0;
}}}
Export the PCORI hierarchy from Babel and import on our test instance:
{{{
select * from i2b2metadata.table_access where c_table_name =
'pcori_terms'; -- for reference
select * from i2b2metadata.pcori_terms;
}}}
Imported the resulting csv and updated the table_access table accordingly.
Then, as a test, pick a term to translate to local terminology (PCORI:
`\\PCORI_\PCORI\DEMOGRAPHIC\HISPANIC\Y\` HERON:
`\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish Origin\`)
{{{
update BlueHeronMetadata.pcori_terms
set c_dimcode = '\i2b2\Demographics\Ethnicity\Hispanic, Latino or Spanish
Origin\' where c_fullname = '\PCORI\DEMOGRAPHIC\HISPANIC\Y\'
;
}}}
Now, running a query based on PCORI terminology returns patient counts at
KU!
{{{
Finished Query: "CDM_ETL_Test"
[3.3 secs]
Compute Time: 2 secs
Number of patients for "CDM_ETL_Test"
patient_count: 26
}}}
Then, I used Dan's [wiki:PortQuery Port Query plugin] to run the query on
Babel (which I then put in the shared folder):
`SHARED->CDM_ETL_145->CDM_GPC_Mapping_Prototype [12-18-2014]
[[email protected]]`.
Note that one slight wrinkle is that the c_columndatatype should be "T"
rather than '@' (thanks to Matt for helping me debug that one). I added
[https://bitbucket.org/DanC/pcornet-dm/issue/6/c_columndatatype-should-
be-t-rather-than an issue to pcornet-dm]. For now, I just did:
{{{
update i2b2metadata.pcori_terms
set c_columndatatype = 'T' where c_fullname =
'\PCORI\DEMOGRAPHIC\HISPANIC\Y\'
;
}}}
Note I had to do the same thing on the KU instance.
I'm close to being able to then build the CDM table with at least that one
column based on [https://bitbucket.org/njgraham/pcori-annotated-data-
dictionary/src/1c1c0f980377bbd1adbe839edf84d8de92f5b1de/CDM_transform.sql?at=default
CDM_transform.sql] we used for the ETL Annotated Data Dictionary. Instead
of using the `path_map` file to map from PCORI to HERON terms, it could
just query the PCORI hierarchy directly.
--
Ticket URL:
<http://informatics.gpcnetwork.org/trac/Project/ticket/145#comment:14>
gpc-informatics <http://informatics.gpcnetwork.org/>
Greater Plains Network - Informatics
_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev