[ 
https://issues.apache.org/jira/browse/TIKA-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18055154#comment-18055154
 ] 

ASF GitHub Bot commented on TIKA-4638:
--------------------------------------

tballison merged PR #2557:
URL: https://github.com/apache/tika/pull/2557




> Unify sax "style" configurations in 4.x
> ---------------------------------------
>
>                 Key: TIKA-4638
>                 URL: https://issues.apache.org/jira/browse/TIKA-4638
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> We've had ongoing needs for easy user configuration for:
> a) include embedded filenames in the sax output or not
> b) include the metadata title in the sax output or not
> Further, with RMETA or the json output of CONCATENATE, if a user wants xhtml 
> as the sax output type, there is typically no need to dump the metadata into 
> the xhtml. We should make this configurable as well.
> The key point here and on TIKA-4633 is that the user should only have to 
> touch one logical configuration object, even though different underlying 
> components in Tika will act on those. For example, in this case, the 
> metadata/title stuff is handled in the XHTMLContentHandler, and the embedded 
> filenames would be handled in the ParsingEmbeddedDocumentExtractor.
> I think for some of the config objects, we should simplify for the user's 
> sake and not require them to know the underlying components.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to