[ 
https://issues.apache.org/jira/browse/TIKA-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4530.
-------------------------------
    Fix Version/s: 4.0.0
                   3.3.0
       Resolution: Fixed

> Don't let body content slip into headers in MboxParser
> ------------------------------------------------------
>
>                 Key: TIKA-4530
>                 URL: https://issues.apache.org/jira/browse/TIKA-4530
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 4.0.0, 3.3.0
>
>
> On an mbox file that's part of the ipres2025 Digital Preservation Bakeoff, I 
> noticed that we were getting content types that looked like this: 
> \{{message/rfc822, multipart/alternative; a {text-decoration: 
> none;text-decoration:none!important;} <t...}}.
>  
> The problem is that we're caching what look like multiline header bits 
> whether or not we're in an rfc822 header within an mbox file. We should stop 
> caching multiline bits if we're not in a header.
>  
> [https://www.ipres2025.nz/post/ipres-tools-demo-session-the-digital-preservation-bake-off]
>  
> Pantry: 
> [https://drive.google.com/drive/folders/1_BFjNw95HhH45VO-Y2gmJTbd6kfYAtuY]
> The file is from edrm: [email protected]: 
> https://drive.google.com/drive/folders/1gpUbxmb8-AL2r1eqCODLzeT7QBN1LW9S



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to