[ 
https://issues.apache.org/jira/browse/TIKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558384#comment-16558384
 ] 

Tim Allison commented on TIKA-2694:
-----------------------------------

I'm pretty sure this is the way that "addresses" can be stored in Outlook.  
I've seen actual email addresses in .msg, but these Outlook exchange addresses 
are quite common, and very annoying if you're expecting actual email addresses. 
 If you can find that the actual email address is stored somewhere in the 
MAPIMessage object for this file, let us know.

> "From" headers is not always extracted correctly on msg mails
> -------------------------------------------------------------
>
>                 Key: TIKA-2694
>                 URL: https://issues.apache.org/jira/browse/TIKA-2694
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>         Environment: CentOS 7
> Windows 10
>            Reporter: Celpan Valeria
>            Priority: Major
>         Attachments: Fw Anime User Analysis.msg
>
>
> For some emails we get instead of the email address for "From" field a value 
> which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`.
>  The issue seems to be connected to the library 
> `org.apache.poi:poi-scratchpad:3.17` as when running   
> `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode,
>  ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = 
> "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`.
>  Check attachment to reproduce this defect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to