[
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272297#comment-14272297
]
Luis Filipe Nassif commented on TIKA-623:
-----------------------------------------
I think currently OutlookPSTParser does not extract .msg files, as they do not
exist inside pst, mails are broken in several pieces. Looking at the source, it
seems to extract/process raw text mail bodies and attachments, even if you set
up the parsing to recurse down only one level.
And to get the relationship between a mail and its attachs, I think you will
need to monitor the handler output currently. I think the parser could be
improved to set a parent mail id into the metadata of its attachs and vice
versa to make easier to recover the relationships.
> Add support for Outlook PST
> ---------------------------
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Tran Nam Quang
> Fix For: 1.6
>
> Attachments: OutlookPSTParser.java
>
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST
> file. There's a relatively new Java library called java-libpst for reading
> Outlook PST files. It is licensed under the LGPL and available over here:
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)