PeterAlfredLee opened a new pull request #336:
URL: https://github.com/apache/tika/pull/336


   According to these web pages: [Windows-1252 Chraracter 
list](https://www.fileformat.info/info/charset/windows-1252/list.htm) , 
[ISO-8859-1 Chraracter 
list](http://www.fileformat.info/info/charset/ISO-8859-1/list.htm), 
[ISO-8859-15 Chraracter 
list](https://www.fileformat.info/info/charset/ISO-8859-15/list.htm)
   
   There are 5 byte values ( 0x81, 0x8d, 0x8f, 0x90, 0x9d ) that charset 
Windows-1252 don't has but charset ISO-8859-1 and  charset ISO-8859-15 have.
   
   I think we can add one more judgment condition:  if content has these byte 
values , means charset isn't Windows-1252


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to