Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "cTAKESParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/cTAKESParser?action=diff&rev1=7&rev2=8

Comment:
- update instructions

  
  = Prepare your CTAKES configuration properties file =
  
- The cTAKESParser requires a configuration properties file. You can find an 
example 
[[https://issues.apache.org/jira/secure/attachment/12737116/CTAKESConfig.properties|here]]
 on [[https://issues.apache.org/jira/browse/TIKA-1645|TIKA-1645]].
+ The cTAKESParser requires a configuration properties file. You can find an 
example 
[[https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/org/apache/tika/parser/ctakes/CTAKESConfig.properties|here]]
 originally from [[https://issues.apache.org/jira/browse/TIKA-1645|TIKA-1645]] 
and adapted and maintained in Github now in 
[[https://github.com/chrismattmann/ctakesparser-utils/|ctakesparser-utils]].
  
  Edit it as follows.
  
@@ -58, +58 @@

  You will need to place the CTAKESConfig.properties file in a classpath 
directory, e.g., org/apache/tika/parser/ctakes and include it on the classpath 
when calling the parser. Follow these steps:
  
   1. `mkdir -p $HOME/src/ctakes-config/org/apache/tika/parser/ctakes && cd 
$HOME/src/ctakes-config/org/apache/tika/parser/ctakes`
-  2. `curl -kO 
"https://issues.apache.org/jira/secure/attachment/12737116/CTAKESConfig.properties"`
+  2. `curl -kO 
"https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/org/apache/tika/parser/ctakes/CTAKESConfig.properties"`
  
  = Setting up the Tika Config file =
  
- You will need a custom Tika configuration file for the parser. You can find 
one 
[[here|https://issues.apache.org/jira/secure/attachment/12737115/tika-config.xml]].
 The reason is that since cTAKESParser decorates AutoDetectParser, in reality, 
cTAKESParser can handle *any* kind of file type that it can. But you have to 
make cTAKESParser intercept the mime types you want it to extract biomedical 
information from. So if you want Tika and its cTAKESParser to etxract 
biomedical information from application/pdf files, you will need this custom 
config and to add application/pdf as a mime that the parser can deal with. The 
default config provided looks like:
+ You will need a custom Tika configuration file for the parser. You can find 
one 
[[here|https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml]].
 The reason is that since cTAKESParser decorates AutoDetectParser, in reality, 
cTAKESParser can handle *any* kind of file type that it can. But you have to 
make cTAKESParser intercept the mime types you want it to extract biomedical 
information from. So if you want Tika and its cTAKESParser to etxract 
biomedical information from application/pdf files, you will need this custom 
config and to add application/pdf as a mime that the parser can deal with. The 
default config provided looks like:
  
  {{{
  <?xml version="1.0" encoding="UTF-8" standalone="no"?>
@@ -70, +70 @@

    <parsers>
      <parser class="org.apache.tika.parser.ctakes.CTAKESParser">
        <mime>application/x-isatab</mime>
+       <parser class="org.apache.tika.parser.DefaultParser"/>
      </parser>
    </parsers>
  </properties>
@@ -84, +85 @@

      <parser class="org.apache.tika.parser.ctakes.CTAKESParser">
        <mime>application/x-isatab</mime>
        <mime>application/pdf</mime>
+       <parser class="org.apache.tika.parser.DefaultParser"/>
      </parser>
    </parsers>
  </properties>
@@ -94, +96 @@

  To download and set up the custom Tika config, do the following.
  
   1. `cd $HOME/src/ctakes-config`
-  2. `curl -kO 
"https://issues.apache.org/jira/secure/attachment/12737115/tika-config.xml"`
+  2. `curl -kO 
"https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml"`
  
  = Putting it all together: Tika-App =
  

Reply via email to