[
https://issues.apache.org/jira/browse/TIKA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann resolved TIKA-674.
------------------------------------
Resolution: Fixed
- patch applied in r1621531. Thanks Andrzej!
> CompositeParser should indicate which parser was actually selected for parsing
> ------------------------------------------------------------------------------
>
> Key: TIKA-674
> URL: https://issues.apache.org/jira/browse/TIKA-674
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 0.10
> Reporter: Andrzej Bialecki
> Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> If multiple parsers exist that support the same mime type, and
> AutoDetectParser (or another CompositeParser) is used, then the parse output
> does not indicate which of the alternative parsers was actually used. I think
> that the name of the parser (FQCN?) should be added to the metadata.
> Something like this trivial patch:
> {code}
> Index: tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
> ===================================================================
> --- tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
> (revision 1135167)
> +++ tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
> (working copy)
> @@ -238,6 +238,7 @@
> try {
> TikaInputStream taggedStream = TikaInputStream.get(stream, tmp);
> TaggedContentHandler taggedHandler = new
> TaggedContentHandler(handler);
> + metadata.add("X-Parsed-By", parser.getClass().getName());
> try {
> parser.parse(taggedStream, taggedHandler, metadata, context);
> } catch (RuntimeException e) {
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)