RE: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Klann, Jeffrey G. Mon, 30 Jun 2014 14:22:48 -0700

Dan, (+cc cer-ont and gpc-dev - I lost track of which this should go to),

We found a couple of issues in our ontology. The following is related to your 
code:


1) c_metadataxml, if present, needs to start with <?xml version="1.0" 
encoding="UTF-8"?>
2) The spec includes some non-ASCII characters (long hyphens and backquotes) 
that confuse some dbs. They need to be converted to ASCII somewhere in the 
pipeline.
3) We added modifier exclusions to the cases (non-draggable folders; visual 
attribute begins with 'C') because modifiers on cases are not queryable.
4) FYI, we reverted most cases (except top-level folders like Diagnosis) back 
to folders (visual attribute begins with 'F').

Only the first is a true bug, others are improvements.

Thanks,
Jeff K.


Jeffrey Klann, PhD
Instructor of Medicine, Harvard Medical School
Assistant in Computer Science, Massachusetts General Hospital
PhD in Research, Partners Healthcare Research Computing
ofc: 617-643-5879
email: [email protected]

> -----Original Message-----
> From: Murphy, Shawn N.
> Sent: Thursday, June 12, 2014 6:09 AM
> To: Klann, Jeffrey G.; Dan Connolly; [email protected]; Russ
> Waitman
> Cc: Matthew Hoag; Nathan Graham; [email protected]
> Subject: RE: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-
> sentinel data model)
> 
> Hi Men,
> 
> I think the next step is to have these decisions reality tested with a
> true set of demo data that could be realistically ETL'd into i2b2.
> Note that common practice in i2b2 is for "raw" codes to go into the
> fact table, and for them to populate the ontology trees as children of
> the basic hierarchies.
> 
> Thanks,
> Shawn.
> 
> 
> -----Original Message-----
> From: Klann, Jeffrey G.
> Sent: Wednesday, June 11, 2014 5:19 PM
> To: Dan Connolly; [email protected]; Russ Waitman
> Cc: Matthew Hoag; Nathan Graham; [email protected]; Murphy, Shawn N.
> Subject: Re: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-
> sentinel data model)
> 
> Dan, sounds good, I spent some time sync'ing up with your version
> (pathnames only) and here are the results of the coin toss afterward:
> 
> 1. DATE and TIME fields are implicit in my version. I added them to
> mine, but I left them hidden. It is more queryable if they are
> implicit, but I agree nice to have it in your face if it is an
> interoperability widget.
> 
> 2. ENROLLMENT\CHART is not a modifier in your version. This is somewhat
> arbitrary, but it's a little cleaner as a modifier I think. If it is a
> fact and not a modifier, you can't do a query in the UI for 'all with
> an enrollment start greater than 5/5/2005 who have CHART:Y during that
> enrollment event'. (It's the 'during that enrollment event' part you
> need it to be a modifier for.)
> 
> 3. DIAGNOSIS\DX_TYPE\ and PROCEDURE\PX_TYPE\ are compressed to just
> DIAGNOSIS\ and PROCEDURE\ in my version, with the types underneath.
> This just eliminates an unnecessary level of the tree and is easier to
> use.
> 
> 4. PROCEDURE\DRG_TYPE\ has just DRGs under it (and now I added null
> flavors), not 01\ and 02\. Perhaps I need to change this because it's
> inconsistent with everything else. But I thought it would be cleaner to
> create a combined tree. Probably not for this version, though.
> 
> 5. My version has ORIGDX and ORIGPX underneath Diagnoses and
> Procedures, respectively (and the null flavor codes are in this
> subtree). Shawn suggested this when he saw "Local Homegrown" under
> procedures - that's disappeared but this could facilitate populating
> the RAW_ fields. I'm still pondering this one and how it'd work. We
> don't expect sites to actually add modifiers to their fact table about
> whether this was mapped from a local code, it's something we'd want to
> pick up at query time.
> 
> To your questions. you're right, our basecodes don't have to sync up
> for querying. I was thinking ahead to collaborating on an ETL process
> and thinking it might be helpful. The query method (which you probably
> know) is full name -> dim code -> pathname (in concept dimension) ->
> concept_cd (in concept dimension). The c_basecode generally equals the
> concept dimension's concept_cd.
> 
> I might have added spaces in manually. That's not a good solution
> though, I agree. The logic I used for capitalization could be used for
> spaces.
> 
> Basecodes don't require colons, only if you want to search by code type
> in the UI. In which case the part before the colon is a scheme (code
> type to search by) and the part after is a code. So in cases where
> there is no code list (e.g., scalars) it doesn't make a lot of sense.
> Then again, I suppose it doesn't hurt anything. We could consider
> prepending everything with 'PCORNET|' to avoid naming conflicts with
> local sites' codes...
> 
> Ok, I'm out of time. I think we're very close!
> 
> 
> Jeffrey Klann, PhD
> Instructor of Medicine, Harvard Medical School Assistant in Computer
> Science, Massachusetts General Hospital PhD in Research Information
> Systems and Computing, Partners Healthcare
> ofc: 617-643-5879
> [email protected]
> 
> 
> 
> 
> 
> On 6/10/14, 11:13 AM, "Dan Connolly" <[email protected]> wrote:
> 
> >As close as we are, I think it would be a waste to not converge on one
> >ontology. It seems to be coin tosses from here on in.
> >
> >That said, it's not clear to me why shared basecodes are essential for
> >query compatibility. I've never seen a basecode in a query.
> >
> >On mapped vs. unmapped... I thought that was something in the earlier
> >drafts but not in CDM v1.0. oops. I'm not sure I understand how it
> works.
> >Care to elaborate with an example?
> >
> >I've been incorporating your code changes manually. I thought all
> >basecodes were supposed to have colons, but I suppose it doesn't
> matter.
> >I can take them off the scalars.
> >
> >On admit date and discharge date, I waffle back and forth. At first I
> >hid them, assuming they're subsumed by start_date and end_date; then I
> >removed the special case because I wanted to be sure it was "in my
> face"
> >as I thought through steps b, c, an d.
> >
> >I'm still not clear on how you came up with the labels. I tried the
> >code you added...
> >
> >>>> word="HUMPTY_DUMPTY"
> >>>> ''.join(x.capitalize() or '_' for x in word.split('_'))
> >'HumptyDumpty'
> >
> >Did you add spaces back in manually?
> >
> >More on steps b, c, and d separately...
> >
> >
> >--
> >Dan
> >
> >________________________________________
> >From: Klann, Jeffrey G. [[email protected]]
> >Sent: Monday, June 09, 2014 10:23 PM
> >To: [email protected]; Russ Waitman; Dan Connolly
> >Cc: Matthew Hoag; Nathan Graham; [email protected]; Murphy, Shawn N.
> >Subject: Re: [gpc-informatics] #109: mapping to PCORI CDM (aka
> >mini-sentinel data model)
> >
> >Hi Dan,
> >
> >Some comments:
> >
> >On A - I agree, we¹re nearing a workable ontology. Per our discussion
> >today, I gather that query compatibility across our networks is of
> >utmost importance. So we need to synchronize our fullnames and
> >basecodes in the PCORnet ontology. I started looking at differences
> >between implementations tonight. There are some things I need to
> update
> >that have changed (e.g., race codes are apparently now prepended with
> a
> >0), but there are also some customizations we made that we need to
> >figure out how to handle. For example, Shawn suggested a modifier for
> >³Mapped² vs ³Unmapped² rather than having (empty) trees for homegrown
> >ontologies. And third, I seem to have removed the : from the base code
> >if it is a scalar. I can¹t remember if I did that in the code or
> >manually. Any chance you can merge in the relevant parts of my code
> changes?
> >
> >Also, do you want to try to synchronize our ontology representations
> or
> >just make them query-compatible? If the latter, there is more work to
> >do but we¹d both be maintaining one ontology, which could be
> advantageous.
> >I¹ll look at this more Tues & Wed but will also send you my version of
> >the ontology in a separate email in case you have time to compare.
> >
> >Also, I noticed in a brief glance that you still have some date fields
> >explicit that I made implicit in mine (I.e. Admit and discharge date)
> -
> >any reason for that? We¹re trying to prevent people from needing to
> >modify their fact table if possible, so a date constraint on a fact
> >makes more sense to me.
> >
> >Before step B, there are a couple other items on my plate. One, we
> want
> >some demo data to convince ourselves that the ontology works. That¹s
> in
> >process - a lot of the PCORnet terms that represent things in the demo
> >data are now query able on i2b2.org through the CDM ontology, just by
> >modifications to the ontology. I haven¹t added facts for things not in
> >the data at all yet (which you¹re welcome to take on if the mood
> strikes you).
> >Two, we¹re also working on standardized mapping processes to get i2b2
> >facts working with the PCORnet ontology, through adding children of
> >PCORnet items and extending dimension dim codes.
> >
> >Step B, AFAIK, will be SQL, not SAS. I¹ve heard they¹re writing an
> >adapter to convert their XML representation to SQL, so maybe we can
> use
> >the XML (perhaps not programmatically - perhaps only for humans to
> >hand-enter queries). Not 100% sure.
> >
> >Step C, I¹m starting to think about the materialization of the CDM. I
> >wrote some SQL today that makes a nice table of ETL operations based
> on
> >the PCORnet ontology table - but it breaks anywhere there¹s a special
> >case
> >- like: base codes under _CODETYPE paths actually represent a code,
> not
> >a code type; implicit dates are not represented, etc. Options to
> handle
> >this:
> >1) Special cases in SQL
> >2) Use other fields in the ontology, like comment
> >3) Define the ETL process from your Python code, not from the metadata
> >table. Possibly there is more power here, but it is not generalizable
> >beyond PCORnet which I don¹t like.
> >4) Use an open source ETL engine. Shawn suggested this, and I don¹t
> >know much about them, but I installed a couple today and they were all
> >giant and overbearing. Probably more than we need, with a steep
> learning curve.
> >But if you have experience, I¹m open.
> >Thoughts?
> >
> >Well, that got long. Hope it¹s helpful.
> >
> >- Jeff K.
> >
> >Jeffrey Klann, PhD
> >Instructor of Medicine, Harvard Medical School Assistant in Computer
> >Science, Massachusetts General Hospital PhD in Research Information
> >Systems and Computing, Partners Healthcare
> >ofc: 617-643-5879
> >[email protected]
> >
> >
> >On 6/9/14, 4:30 PM, "GPC Informatics" <[email protected]> wrote:
> >
> >>#109: mapping to PCORI CDM (aka mini-sentinel data model)
> >>--------------------------+-----------------------------------
> >> Reporter:  rwaitman      |       Owner:  dconnolly
> >>     Type:  enhancement   |      Status:  accepted
> >> Priority:  major         |   Milestone:  initial-data-domains
> >>Component:  data-sharing  |  Resolution:
> >> Keywords:                |  Blocked By:  89
> >> Blocking:                |
> >>--------------------------+-----------------------------------
> >>
> >>Comment (by dconnolly):
> >>
> >> While Jeff and I are perhaps about done with representing the CDM in
> >>i2b2,  other discussions suggest this is one of parts of getting the
> >>whole thing
> >> working:
> >>
> >>   a. represent CDM in i2b2 #109
> >>   b. approximate a popmednet query (SAS script? plain text
> >> description?) by an i2b2 query
> >>   c. run the query and materialize the results a la the CDM
> >>   d. run the SAS code that came in via popmednet
> >>      - needs SAS environment #117?
> >>
> >> I gather the [http://scilhs.org/2014/03/11/scilhs-query-workflow/
> >> SCILHS Query Workflow] includes parts b, c, and d.
> >>
> >>--
> >>Ticket URL:
> >><http://informatics.gpcnetwork.org/trac/Project/ticket/109#comment:14
> >
> >>gpc-informatics <http://informatics.gpcnetwork.org/>
> >>Greater Plains Network - Informatics
> >
> >
> >
> >The information in this e-mail is intended only for the person to whom
> >it is addressed. If you believe this e-mail was sent to you in error
> >and the e-mail contains patient information, please contact the
> >Partners Compliance HelpLine at http://www.partners.org/complianceline
> >. If the e-mail was sent to you in error but does not contain patient
> >information, please contact the sender and properly dispose of the
> >e-mail.
> >

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: [gpc-informatics] #109: mapping to PCORI CDM (aka mini-sentinel data model)

Reply via email to