[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216178#comment-16216178
]
Tim Allison edited comment on TIKA-2478 at 10/24/17 1:54 AM:
-------------------------------------------------------------
bq. The important thing to note here is that, in multipart MIME messages, it is
perfectly valid to have parts within parts. In theory, that nesting can extend
to any depth. Any reasonably capable email client should then be able to
recursively process all of the message parts.
https://stackoverflow.com/questions/3902455/mail-multipart-alternative-vs-multipart-mixed
Yikes...as [~kkrugler] pointed out above...
was (Author: [email protected]):
bq. The important thing to note here is that, in multipart MIME messages, it is
perfectly valid to have parts within parts. In theory, that nesting can extend
to any depth. Any reasonably capable email client should then be able to
recursively process all of the message parts.
https://stackoverflow.com/questions/3902455/mail-multipart-alternative-vs-multipart-mixed
Yikes!
> MBOX import includes redundant copies of the text
> -------------------------------------------------
>
> Key: TIKA-2478
> URL: https://issues.apache.org/jira/browse/TIKA-2478
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.16
> Reporter: Robert Letzler
> Assignee: Tim Allison
> Priority: Minor
> Attachments: UET6KCXR5FYIEJYKUCK2AKF3FLXTRNAT.eml, mixed-simple,
> mixed-with-pdf-inline
>
>
> MBOX messages often get parsed into four documents:
> a. The mbox file - outer container "/"
> b. The actual email-- "/embedded-1"
> c. The utf-8 text content of the email "/embedded-1/embedded-2"
> d. The utf-8 html content of the email "/embedded-1/embedded-3"
> entries C and D are redundant and distracting. The MSG parser parses the
> first non-null: email body and then it skips the rest. Please modify MBOX to
> not have separate "attached" documents for the html body and the text body.
> The attachment to https://issues.apache.org/jira/browse/TIKA-2471 is an
> example of input sufficient to generate this behavior.
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)