Works great, thanks Nick. I’ll update the wiki once we release 1.10 since 1.9 will have the old way of doing it.
Thanks for this! ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Nick Burch <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Monday, June 8, 2015 at 8:29 AM To: "[email protected]" <[email protected]> Subject: Re: svn commit: r1683969 - /tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.t ika.parser.Parser >On Sun, 7 Jun 2015, Mattmann, Chris A (3980) wrote: >> Great question Nick. If you have a better idea on how to make it so >>that >> any file can come into the cTAKES parser, get its text and metadata >> parsed out, and then feed that into cTAKES Iʼm all ears. We just >>thought >> that decorating AutoDetect would serve that purpose for us. Since >>cTAKES >> just puts metadata in the met object (as of now) and doesnʼt do XHTML >> content (future improvement), I supposed we could instantiate an >> AutoDetectParser instead of decorating it which may help. Dunno, >>anyways >> looking forward to what your thoughts are :-) > >I've had a go at this, and fixed a few Tika bugs on the way... You can >now >(as detailed in the javadoc) just do: > AutoDetectParser parser = new AutoDetectParser(new CTAKESParser()); >And you'll get auto-detection with cTAKES applied to the result. > >Alternately, if you want to turn on cTAKES support in config, for use eg >with the Tika CLI or Tika Server, you just need a config file like: > <properties> > <parsers> > <parser class="org.apache.tika.parser.ctakes.CTAKESParser"> > <parser class="org.apache.tika.parser.DefaultParser"/> > </parser> > </parsers> > </properties> >(Example config file in SVN!) > > >Does this work for everyone? > >Nick
