[
https://issues.apache.org/jira/browse/TIKA-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4530.
-------------------------------
Fix Version/s: 4.0.0
3.3.0
Resolution: Fixed
> Don't let body content slip into headers in MboxParser
> ------------------------------------------------------
>
> Key: TIKA-4530
> URL: https://issues.apache.org/jira/browse/TIKA-4530
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Fix For: 4.0.0, 3.3.0
>
>
> On an mbox file that's part of the ipres2025 Digital Preservation Bakeoff, I
> noticed that we were getting content types that looked like this:
> \{{message/rfc822, multipart/alternative; a {text-decoration:
> none;text-decoration:none!important;} <t...}}.
>
> The problem is that we're caching what look like multiline header bits
> whether or not we're in an rfc822 header within an mbox file. We should stop
> caching multiline bits if we're not in a header.
>
> [https://www.ipres2025.nz/post/ipres-tools-demo-session-the-digital-preservation-bake-off]
>
> Pantry:
> [https://drive.google.com/drive/folders/1_BFjNw95HhH45VO-Y2gmJTbd6kfYAtuY]
> The file is from edrm: [email protected]:
> https://drive.google.com/drive/folders/1gpUbxmb8-AL2r1eqCODLzeT7QBN1LW9S
--
This message was sent by Atlassian Jira
(v8.20.10#820010)