Great question Nick. If you have a better idea on how to make it
so that any file can come into the cTAKES parser, get its text and
metadata parsed out, and then feed that into cTAKES I’m all ears.
We just thought that decorating AutoDetect would serve that purpose
for us. Since cTAKES just puts metadata in the met object (as of now)
and doesn’t do XHTML content (future improvement), I supposed we could
instantiate an AutoDetectParser instead of decorating it which may
help. Dunno, anyways looking forward to what your thoughts are :-)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: Nick Burch <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday, June 7, 2015 at 5:01 AM
To: "[email protected]" <[email protected]>
Subject: Re: svn commit: r1683969 -
/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.t
ika.parser.Parser

>On Sun, 7 Jun 2015, Mattmann, Chris A (3980) wrote:
>> Also the lovely thing here too is that since cTAKESParser is a
>>decorator 
>> for AutoDetectParser there is magical infinite recursion if it’s
>>enabled 
>> via SPI.
>
>Should it really be a wrapper for AutoDetectParser though? I haven't read
>through the wiki page or the code yet (need to do that after lunch...),
>but my general guess would've been that a wrapping parser should sit
>between AutoDetectParser and DefaultParser? (AutoDetectParser normally
>calls to DefaultParser via the Tika config).
>
>If it worked that way, we could slip it in between the two in the tika
>config file.
>
>Though if someone could quickly point out why it needs to wrap outside
>AutoDetectParser rather than inside, that'd save time!
>
>Nick

Reply via email to