I broke #431<https://informatics.gpcnetwork.org/trac/Project/ticket/431> into 
two targets as I see them:

  *   #572 <https://informatics.gpcnetwork.org/trac/Project/ticket/572> 
enterprise scale unstructured text notes de-identified, in 
i2b2<https://informatics.gpcnetwork.org/trac/Project/ticket/572>
  *   #573  <https://informatics.gpcnetwork.org/trac/Project/ticket/573> 
de-identified text notes for on a cohort-by-cohort 
basis<https://informatics.gpcnetwork.org/trac/Project/ticket/573>

Most of what is involved in the cohort-by-cohort approach is also needed for 
the enterprise scale approach, but it's a smaller lift and it meets 
requirements for some known use cases. Rolling out #573 in the medium term with 
#572 as a stretch goal is something I can get my head around.

The tickets include use cases; for enterprise scale:

  *   Investigator works on an i2b2 query for a genetic marker, in 
collaboration with an honest broker at a site such as Maren at KUMC
  *   Maren distributes the query to GPC sites via babel and the investigator 
submits a GPC DROC request
  *   each honest broker executes the query against their i2b2 and so on as 
described in RC11 of the GPC phase 1 
proposal<https://informatics.gpcnetwork.org/trac/Project/wiki/DataSecurity#providing-data>
     *   
DataBuilder<https://informatics.gpcnetwork.org/trac/Project/wiki/DataBuilder> 
query results from each site include relevant de-identified notes

And for cohort-by-cohort:

Main use case identified at 
HackathonFour<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonFour>
 was de-identified chart review.


Feasibility of this approach is supported by Tim's success at deploying 
​de-id-docker<https://github.com/MCW-BMI/de-id-docker> in a matter of hours 
after George's presentation at 
HackathonFour<https://informatics.gpcnetwork.org/trac/Project/wiki/HackathonFour>.

Note that the underlying technology is the same as KUMC adopted from MCW for 
enterprise scale use:

  *   
​MCW_BMI/unstructured-notes-deidentification<https://bitbucket.org/MCW_BMI/unstructured-notes-deidentification>

Another use case is distributed query a la popmednet. Running federated queries 
"lights out" would involve something like:

  1.  using 
PortQuery<https://informatics.gpcnetwork.org/trac/Project/wiki/PortQuery> to 
run the i2b2 cohort query locally and noting the resulting patient set id
  2.  Invoking the docker container to extract the notes
  3.  Running the distributed analysis code

Combining 2 and 3 into a Jenkins job seems straightforward.

--
Dan

________________________________
From: Dan Connolly
Sent: Wednesday, January 18, 2017 5:19 PM
To: Russ Waitman; Taylor, Bradley
Cc: <[email protected]>
Subject: RE: unstructured text notes: refining the target (#431)

Russ, Brad, w.r.t. figure figure 4 of our proposal, what is "NLP derived 
concepts"?

http://frontiersresearch.org/frontiers/sites/default/files/Phase%20II%20Proposal.pdf

--
Dan

________________________________
From: Dan Connolly
Sent: Tuesday, January 17, 2017 12:24 PM
To: Russ Waitman; Taylor, Bradley
Cc: <[email protected]>
Subject: unstructured text notes: refining the target (#431)

Russ, Brad (when you get back),

I'd like to get a few concrete use cases as targets for this deliverable so 
that we can get tangible experience with what's required and what would be 
nice-to-have.

MCW and IU both report trying the approach of de-identifying all their notes 
and putting them in i2b2 and coming to the conclusion that it was unwieldy. MCW 
now does de-identification on a cohort by cohort basis. I'm not sure how to 
characterize the IU approach.

The cohort-by-cohort basis suffices for GPC needs, as far as I can tell.

For example: suppose investigators specify, in their GPC DROC request, that 
progress notes are part of the data that they want. Then each site runs their 
cohort query and delivers notes for that cohort. The MCW process should work 
well as a recommended method but other methods would be acceptable if a site 
(such as IU) already has a suitable process.

Perhaps one concrete case would be: progress notes for the ALS cohort, since 
it's small, then try the breast cancer cohort. Or are there other cohorts where 
we have a customer demand for notes?

For reference: #431<https://informatics.gpcnetwork.org/trac/Project/ticket/431>

--
Dan

_______________________________________________
Gpc-dev mailing list
[email protected]
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to