[Tika Wiki] Update of "cTAKESParser" by ChrisMattmann

Apache Wiki Sat, 06 Jun 2015 10:36:56 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.


The "cTAKESParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/cTAKESParser?action=diff&rev1=2&rev2=3

  = Signing up for a UMLS account =
  
  To use cTAKES and the cTAKES Tika Parser you need a Unified Medical Language 
System (UMLS) account.
- You can sign up for one [[https://uts.nlm.nih.gov/home.html|here]].
+ You can sign up for one [[https://uts.nlm.nih.gov/home.html|here]]. It can 
take up to 3 business
+ days to get an account so be patient. Once your account is approved you can 
use the cTAKESParser
+ and read on. Future improvements are to provide a means to include the 
offline UMLS dictionary.
  
+ = Prepare your CTAKES configuration properties file =
+ 
+ The cTAKESParser requires a configuration properties file. You can find an 
example 
[[https://issues.apache.org/jira/secure/attachment/12737116/CTAKESConfig.properties|here]]
 on [[https://issues.apache.org/jira/browse/TIKA-1645|TIKA-1645]].
+ 
+ Edit it as follows (expand $HOME below with your actual home path).
+ 
+ {{{
+ 
aeDescriptorPath=$HOME/desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
+ text=false
+ annotationProps=BEGIN,END,ONTOLOGY_CONCEPT_ARR
+ separatorChar=:
+ metadata=Study Title,Study Description
+ UMLSUser=your_username
+ UMLPass=your_password
+ }}}
+ 
+ Analysis is performed on the extracted text and/or metadata from the 
AutoDetectParser. cTAKESParser decorates AutoDetectParser, and then takes the 
extracted metadata and/or text (or both) and then adds ctakes: prefixed 
metadata for procedure, medication, disease and other extracted information.
+ To use the cTAKESParser, update the metadata property to be a comma separated 
list of metadata fields to search for medical terminology in. Then, if you 
would like the parser to also search the extracted text from Tika, set 
`text=true`.
+ 
+ The `annotationProps` is a comma separated list of what cTAKES properties to 
extract, and `separatorChar` is what to use to separate them in the extracted 
field. So, we are telling cTAKES to extract the begin and end of the found text 
(BEGIN,END), first, and then extract the Ontology concept array  
(ONTOLOGY_CONCEPT_ARR). UMLS uses identifiers for each term, that can be used 
to then search UMLS for more information about that term - this array includes 
the UMLS pointer to the term, and any also identified similar terms. An example 
of the extracted annotationProps would be:
+ 
+ `mantle cell lymphoma:40592:40612:C0334634,C0334634,C0334634,C0334634`
+ 
+ In this example, a cTAKES DiseaseDisorderMention of `mantle cell lymphoma` is 
identified, and then its associated annotation props (the text begins at 
position 40592 and ends at position 40612 (could be used for highlighting), and 
then an associated array of medical ontology concept identifiers are provided, 
i.e., C0334634,C0334634,C0334634,C0334634.
+

[Tika Wiki] Update of "cTAKESParser" by ChrisMattmann

Reply via email to