[ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565701#comment-16565701 ]
Tim Allison commented on TIKA-2701: ----------------------------------- +1 cannot describe the joy this brings me that someone cares about a) WMF b) encodings. :D Thank you! > Text is not extracted properly from WMF files > --------------------------------------------- > > Key: TIKA-2701 > URL: https://issues.apache.org/jira/browse/TIKA-2701 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.15 > Reporter: Grigoriy Alekseev > Priority: Major > Fix For: 2.0.0 > > Attachments: thumbnail_1.wmf > > > Text is always extracted assuming it is in cp-1252 encoding. The attached > thumbnail_1.wmf has text in Shift JIS and is extracted incorrectly. Should be > 普林斯. -- This message was sent by Atlassian JIRA (v7.6.3#76005)