[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844976#comment-17844976
]
Tim Allison commented on TIKA-4250:
-----------------------------------
libpff issue opened: https://github.com/libyal/libpff/issues/128
Note that I found non-deterministic behavior even without debug on -- sometimes
I got 7 extracted files, sometimes 8. I noted that in the issue.
> Add a libpst-based parser
> -------------------------
>
> Key: TIKA-4250
> URL: https://issues.apache.org/jira/browse/TIKA-4250
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: 8.eml, 8.msg
>
>
> We currently use the com.pff Java-based PST parser for PST files. It would be
> useful to add a wrapper for libpst as an optional parser.
> One of the benefits of libpst is that it creates .eml or .msg files from the
> PST records. This is critical for those who want the original bytes from
> embedded files. Obv, PST doesn't store eml or msg, but some users want the
> "original" emails even if they are constructed from PST records.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)