Tim Allison created TIKA-3788:
---------------------------------
Summary: Allow embedded exceptions and warnings to percolate to
the parent's metadata
Key: TIKA-3788
URL: https://issues.apache.org/jira/browse/TIKA-3788
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
As part of work on TIKA-3787, I'll add a ParseRecord to the ParseContext. This
can be used by parsers that parse embedded files to record caught exceptions
and warning messages. The CompositeParser keeps track of depth of its parse
and when the depth returns to 0, it will write these exceptions and warnings to
the Metadata object.
I would still highly recommend /rmeta, -J, the RecursiveParserWrapper, but this
new capability adds some functionality to the standard /tika (with json
output), and programmatically to the AutoDetectParser.
Because this information is added to the metadata object _after_ the parse, it
will not come through in streaming contexts where the metadata object has is
written to the xhtml before the content of the file is parsed. So, this will
not add any benefit to /tika (text/html).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)