On Sun, 7 Jun 2015, Mattmann, Chris A (3980) wrote:
Great question Nick. If you have a better idea on how to make it so that any file can come into the cTAKES parser, get its text and metadata parsed out, and then feed that into cTAKES I’m all ears. We just thought that decorating AutoDetect would serve that purpose for us. Since cTAKES just puts metadata in the met object (as of now) and doesn’t do XHTML content (future improvement), I supposed we could instantiate an AutoDetectParser instead of decorating it which may help. Dunno, anyways looking forward to what your thoughts are :-)

I've had a go at this, and fixed a few Tika bugs on the way... You can now (as detailed in the javadoc) just do:
   AutoDetectParser parser = new AutoDetectParser(new CTAKESParser());
And you'll get auto-detection with cTAKES applied to the result.

Alternately, if you want to turn on cTAKES support in config, for use eg with the Tika CLI or Tika Server, you just need a config file like:
  <properties>
    <parsers>
      <parser class="org.apache.tika.parser.ctakes.CTAKESParser">
         <parser class="org.apache.tika.parser.DefaultParser"/>
      </parser>
    </parsers>
  </properties>
(Example config file in SVN!)


Does this work for everyone?

Nick

Reply via email to