[
https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904156#comment-14904156
]
Nick Burch commented on TIKA-1739:
----------------------------------
My view is that {{AutoDetectParser}} is a special kind of parser decorator too.
It doesn't do parsing, it decorates a set of other parsers by first doing
detection, then handing the type to those to be processed. Because of this,
{{AutoDetectParser}} has a few other restrictions, such as that it can't be set
in the Tika Config file, you have to explicitly ask for it, and it must be the
outer-most decorator
As I understand it, what the cTAKES decorator does is enhance the output of
other parsers with medical related information, either all parsers, or just
some types+parsers. As such, I believe it needs to go outside of
{{DefaultParser}} or a collection of explicit parsers wrapped as a
{{CompositeParser}}. It waits for those real parser(s) to run, then
enhances/decorates their output. It needs to be inside the AutoDetectParser
"decoration", as it needs to wait for the type to be found before it can work
out if it applies or not (for many cases)
The key thing to remember - {{AutoDetectParser}} is not a parser! It's a
decorator on a set of other parsers, which finds the type first. You might give
your file to {{AutoDetectParser}}, but that isn't actually what does the work.
> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
> Key: TIKA-1739
> URL: https://issues.apache.org/jira/browse/TIKA-1739
> Project: Tika
> Issue Type: Bug
> Components: parser, server
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.11
>
> Attachments: TIKA-1739.patch
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but
> blank metadata comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain"
> http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on
> https://github.com/chrismattmann/shangridocs/tree/convert-wicket which is
> where I first saw this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)