[
https://issues.apache.org/jira/browse/TIKA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2280:
------------------------------------------
Description:
While the MESSAGE_FROM metadata field is extracted for both RFC and Outlook
emails, it doesn't include the address for Outlook emails.
For example, if the raw from field is "John Doe <[email protected]>", the
Outlook email parser sets MESSAGE_FROM to "John Doe" while the RFC email parser
sets it to "John Doe <[email protected]>".
Currently I'm getting the from address from the RAW_HEADER_FROM field for
Outlook emails, but it would be nice to be able to use a standard across email
formats.
was:
While the MESSAGE_FROM metadata field is extracted for RFC emails, it isn't for
Outlook emails. The closest thing we have for Outlook emails is the creator
field, which only includes the name (but not the email address).
Currently I'm getting the from address from the RAW_HEADER_FROM field, but it
would be nice to be able to use a standard across email formats.
> message_from not extracted from Outlook emails
> ----------------------------------------------
>
> Key: TIKA-2280
> URL: https://issues.apache.org/jira/browse/TIKA-2280
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.14
> Reporter: Matthew Caruana Galizia
> Priority: Minor
> Labels: email, outlook, poi
>
> While the MESSAGE_FROM metadata field is extracted for both RFC and Outlook
> emails, it doesn't include the address for Outlook emails.
> For example, if the raw from field is "John Doe <[email protected]>", the
> Outlook email parser sets MESSAGE_FROM to "John Doe" while the RFC email
> parser sets it to "John Doe <[email protected]>".
> Currently I'm getting the from address from the RAW_HEADER_FROM field for
> Outlook emails, but it would be nice to be able to use a standard across
> email formats.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)