PeterAlfredLee opened a new pull request #336: URL: https://github.com/apache/tika/pull/336
According to these web pages: [Windows-1252 Chraracter list](https://www.fileformat.info/info/charset/windows-1252/list.htm) , [ISO-8859-1 Chraracter list](http://www.fileformat.info/info/charset/ISO-8859-1/list.htm), [ISO-8859-15 Chraracter list](https://www.fileformat.info/info/charset/ISO-8859-15/list.htm) There are 5 byte values ( 0x81, 0x8d, 0x8f, 0x90, 0x9d ) that charset Windows-1252 don't has but charset ISO-8859-1 and charset ISO-8859-15 have. I think we can add one more judgment condition: if content has these byte values , means charset isn't Windows-1252 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
