[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843740#comment-17843740
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 1:02 PM:
-----------------------------------------------------------
Wow. This is super helpful. I guess the answer is to run all three?
But seriously, should we fork java-libpst and add your extra fixes? Or, better,
try to push them into the actual java-libpst? Longer term, we could see about
adding meetings, Documents, Notes, vCalendars and vJournals into that fork?
This gives some confidence that we were doing will with java-libpst.
In my own, much more modest testing (one large pst), I noticed that libpst had
fewer emails and fewer attachments. What was weird, though, was that the number
of emails was equal or closer to equal when I turned debug-mode on on libpst.
It was much, much slower, but it got the same number of emails as java-libpst.
Again, thank you!
was (Author: [email protected]):
Wow. This is super helpful. I guess the answer is to run all three?
But seriously, should we fork java-libpst and add your extra fixes? Or, better,
try to push them into the actual java-libpst? Longer term, we could see about
adding meetings, Documents, Notes and Vjournals into that fork?
This gives some confidence that we were doing will with java-libpst.
In my own, much more modest testing (one large pst), I noticed that libpst had
fewer emails and fewer attachments. What was weird, though, was that the number
of emails was equal or closer to equal when I turned debug-mode on on libpst.
It was much, much slower, but it got the same number of emails as java-libpst.
Again, thank you!
> Add a libpst-based parser
> -------------------------
>
> Key: TIKA-4250
> URL: https://issues.apache.org/jira/browse/TIKA-4250
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> We currently use the com.pff Java-based PST parser for PST files. It would be
> useful to add a wrapper for libpst as an optional parser.
> One of the benefits of libpst is that it creates .eml or .msg files from the
> PST records. This is critical for those who want the original bytes from
> embedded files. Obv, PST doesn't store eml or msg, but some users want the
> "original" emails even if they are constructed from PST records.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)