Hi Justin,

A shot in the dark:
You could create a collection reader that works similarly to 
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader , but instead of 
grabbing all of the files in a directory it grabs all the records parsed from a 
single .xml and runs a pipeline per record.  Basically, swap a directory for an 
.xml, a text file for an xml element containing a record.
Somebody out there might have something that already does as much.

Sean

-----Original Message-----
From: Justin Zhang [mailto:justinzhang...@gmail.com] 
Sent: Wednesday, August 05, 2015 6:40 PM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: how to run i2b2 data

Hello everyone,

I am running ctakes with i2b2 data
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.i2b2.org_NLP_DataSets_Main.php&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=IygWj6YGkcjofGRbrDiFJacJHMaBveHR9qzY0VD1AAE&s=swpt3QP4-B392iLlJ9wypBwD17tRDOCxPdSZOW1rS8s&e=
 

In each xml file, there are multiple patient records. I am able to separate 
each patient into single files and process them with "runCPE.sh"

Is there a way to convert this single xml file into the format "ctakes"
accepted, and process as a single input file, and generate a single output file 
(results labelled by patient id). For example, each patient id has a "smoking 
status".

Thanks,

--
Justin

Reply via email to