[
https://issues.apache.org/jira/browse/TIKA-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4638.
-------------------------------
Resolution: Fixed
> Unify sax "style" configurations in 4.x
> ---------------------------------------
>
> Key: TIKA-4638
> URL: https://issues.apache.org/jira/browse/TIKA-4638
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> We've had ongoing needs for easy user configuration for:
> a) include embedded filenames in the sax output or not
> b) include the metadata title in the sax output or not
> Further, with RMETA or the json output of CONCATENATE, if a user wants xhtml
> as the sax output type, there is typically no need to dump the metadata into
> the xhtml. We should make this configurable as well.
> The key point here and on TIKA-4633 is that the user should only have to
> touch one logical configuration object, even though different underlying
> components in Tika will act on those. For example, in this case, the
> metadata/title stuff is handled in the XHTMLContentHandler, and the embedded
> filenames would be handled in the ParsingEmbeddedDocumentExtractor.
> I think for some of the config objects, we should simplify for the user's
> sake and not require them to know the underlying components.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)