[ 
https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535351#comment-16535351
 ] 

Yury Kats edited comment on TIKA-2680 at 7/6/18 9:07 PM:
---------------------------------------------------------

Indeed, the first embedded rfc822 is not an attachment. I believe this is 
because it's an Exchange journaled email, see the presence of 
X-MS-Journal-Report header at the very top. 
In this case, the original message is wrapped in another message that can 
provide additional headers, such as Bcc and expanded distribution lists.


was (Author: yurykats):
Indeed, the first embedded rfc822 is not an attachment. I believe this is 
because it's an Exchange journaled email, see the presence of 
X-MS-Journal-Report header at the very top. 

> Email attachments to an email are not extracted
> -----------------------------------------------
>
>                 Key: TIKA-2680
>                 URL: https://issues.apache.org/jira/browse/TIKA-2680
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.18
>            Reporter: Yury Kats
>            Assignee: Tim Allison
>            Priority: Major
>         Attachments: nested.eml
>
>
> I have a number of email messages that contain other email messages as 
> attachments (with multiple levels of nesting).
> The email attachments are parts with "Content-Type: message/rfc822" but are 
> not being recognized as such.
> Attached is an example email, with the multiple levels of attachments:
>  * Subject: Test email within email
>  ** Subject: Email within email test
>  *** Subject: Stand-up today
>  
> I would like to see 3 separate emails parsed out (top level, 1st level 
> attached email, 2nd level attached email), but I only get 1 email and 1 
> unnamed text attachment:
> {noformat}
> $ java -jar tika-app-1.18.jar -m -J nested.eml | python -m json.tool
> [
> {
> "Author": "Smith Van der, H (Henry) <[email protected]>",
> "Content-Length": "16649",
> "Content-Type": "message/rfc822",
> "Creation-Date": "2018-04-25T12:46:41Z",
> "Message-From": "Smith Van der, H (Henry) <[email protected]>",
> "Message-To": [
> "fm.SAN Management Team <[email protected]>",
> "Smith Van der, H (Henry) <[email protected]>"
> ],
> "Message:From-Email": "[email protected]",
> "Message:From-Name": "Smith Van der, H (Henry)",
> "Message:Raw-Header:Auto-Submitted": "auto-generated",
> "Message:Raw-Header:Content-Transfer-Encoding": "binary",
> "Message:Raw-Header:Keywords": "",
> "Message:Raw-Header:MIME-Version": "1.0",
> "Message:Raw-Header:Message-ID": 
> "<[email protected]>",
> "Message:Raw-Header:Return-Path": "<>",
> "Message:Raw-Header:Sender": 
> "<[email protected]>",
> "Message:Raw-Header:X-MS-Exchange-Generated-Message-Source": "Journal Agent",
> "Message:Raw-Header:X-MS-Exchange-Parent-Message-Id": 
> "<[email protected]>",
> "Message:Raw-Header:X-MS-Journal-Report": "",
> "Multipart-Boundary": "_728aa617-16cf-4d95-8bc2-9f1868397202_",
> "Multipart-Subtype": "mixed",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.mail.RFC822Parser"
> ],
> "X-TIKA:parse_time_millis": "325",
> "creator": "Smith Van der, H (Henry) <[email protected]>",
> "dc:creator": "Smith Van der, H (Henry) <[email protected]>",
> "dc:title": "Test email within email",
> "dcterms:created": "2018-04-25T12:46:41Z",
> "meta:author": "Smith Van der, H (Henry) <[email protected]>",
> "meta:creation-date": "2018-04-25T12:46:41Z",
> "resourceName": "nested.eml",
> "subject": "Test email within email"
> },
> {
> "Content-Encoding": "US-ASCII",
> "Content-Type": "text/plain; charset=US-ASCII",
> "Multipart-Boundary": 
> "_004_8075737674787666767166806676697476787366657271727266777_",
> "Multipart-Subtype": "mixed",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.txt.TXTParser"
> ],
> "X-TIKA:embedded_resource_path": "/embedded-1",
> "X-TIKA:parse_time_millis": "5",
> "embeddedResourceType": "ATTACHMENT"
> }
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to