Yury Kats created TIKA-2685:
-------------------------------

             Summary: Email attached to an undeliverable email report are not 
extracted
                 Key: TIKA-2685
                 URL: https://issues.apache.org/jira/browse/TIKA-2685
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.18
            Reporter: Yury Kats
         Attachments: undeliverable.eml

I have a number of email messages that are reported of deliverable emails that 
contain the opriginal email message as attachment.

The original emails are parts with "Content-Type: message/rfc822" but are not 
being recognized as such.

Attached is an example email:
 * Subject: Undeliverable: SRE Agent Out of Space Source:WindowsApp
 ** Subject: Subject: SRE Agent Out of Space Source:WindowsApp
 
I would like to see 2 separate emails parsed out (top level undeliverable 
report, 1st level attached original email), but I get 1 email and 2 unnamed 
text attachments:

{noformat}
$ java -jar tika-app-1.18.jar -m -J  /tmp/undeliverable.eml | python -m 
json.tool
[
    {
        "Author": "[email protected]",
        "Content-Length": "17356",
        "Content-Type": "message/rfc822",
        "Creation-Date": "2017-11-04T16:00:11Z",
        "Message-From": "[email protected]",
        "Message-To": "[email protected]",
        "Message:From-Email": "[email protected]",
        "Message:Raw-Header:Auto-Submitted": "auto-generated",
        "Message:Raw-Header:MIME-Version": "1.0",
        "Message:Raw-Header:Message-ID": 
"<[email protected]>",
        "Message:Raw-Header:Return-Path": "<>",
        "Message:Raw-Header:Sender": 
"<[email protected]>",
        "Message:Raw-Header:X-MS-Exchange-Generated-Message-Source": "Journal 
Agent",
        "Message:Raw-Header:X-MS-Exchange-Message-Is-Ndr": "",
        "Message:Raw-Header:X-MS-Exchange-Parent-Message-Id": 
"\t<[email protected]>",
        "Message:Raw-Header:X-MS-Journal-Report": "",
        "Multipart-Boundary": "_5a8d7320-7cd6-4c1b-8e30-9616634562b2_",
        "Multipart-Subtype": "mixed",
        "X-Parsed-By": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.mail.RFC822Parser"
        ],
        "X-TIKA:parse_time_millis": "326",
        "creator": "[email protected]",
        "dc:creator": "[email protected]",
        "dc:title": "Undeliverable: SRE Agent Out of Space Source:WindowsApp",
        "dcterms:created": "2017-11-04T16:00:11Z",
        "meta:author": "[email protected]",
        "meta:creation-date": "2017-11-04T16:00:11Z",
        "resourceName": "undeliverable.eml",
        "subject": "Undeliverable: SRE Agent Out of Space Source:WindowsApp"
    },
    {
        "Content-Encoding": "windows-1252",
        "Content-Type": "text/plain; charset=windows-1252",
        "Multipart-Boundary": "_dd8c2c7d-5333-4f9a-a282-d2056075e7aa_",
        "Multipart-Subtype": "report",
        "X-Parsed-By": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.txt.TXTParser"
        ],
        "X-TIKA:embedded_resource_path": "/embedded-1",
        "X-TIKA:parse_time_millis": "4",
        "embeddedResourceType": "ATTACHMENT"
    },
    {
        "Content-Encoding": "US-ASCII",
        "Content-Type": "text/html; charset=US-ASCII",
        "Multipart-Boundary": "_dd8c2c7d-5333-4f9a-a282-d2056075e7aa_",
        "Multipart-Subtype": "report",
        "X-Parsed-By": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.html.HtmlParser"
        ],
        "X-TIKA:embedded_resource_path": "/embedded-2",
        "X-TIKA:parse_time_millis": "7",
        "embeddedResourceType": "ATTACHMENT"
    }
]
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to