[
https://issues.apache.org/jira/browse/TIKA-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896423#comment-17896423
]
ASF GitHub Bot commented on TIKA-4345:
--------------------------------------
tballison merged PR #2037:
URL: https://github.com/apache/tika/pull/2037
> Allow body-only content extraction for msg and other email formats
> ------------------------------------------------------------------
>
> Key: TIKA-4345
> URL: https://issues.apache.org/jira/browse/TIKA-4345
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> At least in the OutlookExtractor, we're writing some of the headers into the
> content stream. For some use cases, it would be helpful to extract only the
> body content into the content stream.
> Looks like OutlookExtractor and maybe OutlookPSTParser are the only parsers
> that need to be modified. We're not writing the from/to etc in the
> RFC822Parser into the content stream.
> I propose that this be a non-breaking/opt-in option in 3.x, and then the
> default in 4.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)