ifly6 opened a new pull request #191:
URL: https://github.com/apache/commons-text/pull/191


   Currently, if given text like so:
   
   ```
   "arma virumque cano…"
   "“bread and circuses”"
   ```
   
   `StringEscapeUtils` will return the corresponding Unicode characters for 
points 133, 147, and 148, which are bunch of obscure basically-never-used 
control characters that display as spaces. Those characters are, however, used 
more often in [Windows-1252](https://en.wikipedia.org/wiki/Windows-1252) 
encoding corresponding to characters like € and ™.
   
   I've changed `NumericEntityUnescaper` treat valid CP-1252 code points 
between 128 and 159 (inclusive) as CP-1252 characters and decodes them to the 
corresponding punctuation etc marks instead of the obscure Unicode control 
characters.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to