ifly6 opened a new pull request #191: URL: https://github.com/apache/commons-text/pull/191
Currently, if given text like so: ``` "arma virumque cano…" "“bread and circuses”" ``` `StringEscapeUtils` will return the corresponding Unicode characters for points 133, 147, and 148, which are bunch of obscure basically-never-used control characters that display as spaces. Those characters are, however, used more often in [Windows-1252](https://en.wikipedia.org/wiki/Windows-1252) encoding corresponding to characters like € and ™. I've changed `NumericEntityUnescaper` treat valid CP-1252 code points between 128 and 159 (inclusive) as CP-1252 characters and decodes them to the corresponding punctuation etc marks instead of the obscure Unicode control characters. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
