[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633433#comment-14633433 ]
Tim Allison commented on TIKA-1238: ----------------------------------- [~rangma], Any chance you could share a test file? Do you know what the actual encoding of the msg file is? > Update OutlookExtractor to handle codepage identification more rigorously > ------------------------------------------------------------------------- > > Key: TIKA-1238 > URL: https://issues.apache.org/jira/browse/TIKA-1238 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Minor > Fix For: 1.10 > > > Since OutlookExtractor's codepage detection chunk was written, POI's HSMF has > added more robutst capabilities for identifying codepages in Outlook .msg > files. As a first step to integrating those improvements, I'll copy and > paste some of POI's code into OutlookExtractor. As a second step, I'll > expose more of HSMF's capabilities within POI and then factor out the > duplicate code in Tika. -- This message was sent by Atlassian JIRA (v6.3.4#6332)