[
https://issues.apache.org/jira/browse/TIKA-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842471#comment-17842471
]
Hudson commented on TIKA-4248:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk11 #1617 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1617/])
TIKA-4248 -- improve handling of attachments in PST (#1738) (github:
[https://github.com/apache/tika/commit/de282d2861009895eecdb07784dceb5d777f372a])
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParserTest.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html/JSoupParser.java
* (edit) tika-core/src/main/java/org/apache/tika/metadata/Office.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParser.java
* (add) tika-core/src/main/java/org/apache/tika/metadata/PST.java
* (edit) CHANGES.txt
* (add)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/PSTMailItemParser.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
> Improve PST handling of attachments
> -----------------------------------
>
> Key: TIKA-4248
> URL: https://issues.apache.org/jira/browse/TIKA-4248
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> The PST parser doesn't handle attachments in quite the same way as other
> parsers which hinders analysis of attachments.
> The problem is that the PST parser handles the text content of an email and
> the embedded attachments. And, the PST parser processes attachments before
> the main body. These two features make the normal patterns for embedded
> attachments break down in the RecursiveParserWrapper. For example, when the
> attachments are being processed, the RecursiveParserWrapper can't figure out
> what the path will be through the "body" because that hasn't been parsed yet.
> We should probably create a PSTMailItemParser that handles the content and
> the attachments like other parsers so that embedded paths can be maintained.
> This will be a breaking change, and I'm targeting it only to the 3.x branch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)