[ 
https://issues.apache.org/jira/browse/LANG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051180#comment-14051180
 ] 

Miquel edited comment on LANG-1022 at 7/3/14 8:25 AM:
------------------------------------------------------

I expect it follows the actual behavior: (Using the function 
NumericEntityUnescaper.translate(CharSequence input, Writer out)) 
- If the argument is a negative int like ""&#-14844069;" it writes "�"
- If the argument is a positive int or 0 like "\7" it writes the char 
representation (7) (smaller than MAX_CODE_POINT)
- If the argument doesn't start with "&#" it writes the same input to the 
writer.
- If the argument is not an hex representation like "&#aaaa;" but with valid 
characters for an HEX, it captures a NumberFormatException and writes the input 
"&#aaaa;"
- If the argument is a string it writes the input to the output.

- It throws an IllegalArgumentException if the NumericEntityUnescaper is 
configured with the option errorIfNoSemiColon and the input doesn't ends with 
it.

Why do you think that the expected behavior is throw an 
IllegalArgumentException if the value is a positive integer bigger than 
MAX_CODE_POINT?

I expect it writes the input to the output without throw an exception.



was (Author: mcanes):
I expect it follows the actual behavior: (Using the function 
NumericEntityUnescaper.translate(CharSequence input, Writer out)) 
- If the argument is a negative int like ""&#-14844069;" it writes "�"
- If the argument is a positive int or 0 like "7" it writes the char 
representation (7) (smaller than MAX_CODE_POINT)
- If the argument doesn't start with "&#" it writes the same input to the 
writer.
- If the argument is not an hex representation like "&#aaaa;" but with valid 
characters for an HEX, it captures a NumberFormatException and writes the input 
"&#aaaa;"
- If the argument is a string it writes the input to the output.

- It throws an IllegalArgumentException if the NumericEntityUnescaper is 
configured with the option errorIfNoSemiColon and the input doesn't ends with 
it.

Why do you think that the expected behavior is throw an 
IllegalArgumentException if the value is a positive integer bigger than 
MAX_CODE_POINT?

I expect it writes the input to the output without throw an exception.


> NumericEntityUnescaper.translate throws an IllegalArgumentException if 
> entityValue > MAX_CODE_POINT
> ---------------------------------------------------------------------------------------------------
>
>                 Key: LANG-1022
>                 URL: https://issues.apache.org/jira/browse/LANG-1022
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>            Reporter: Miquel
>            Priority: Minor
>
> We found that using the function StringEscapeUtils.unescapeHtml4 crashes if 
> the argument is "�" and throws an IllegalArgumentException.
> This happens because internally it calls the function 
> NumericEntityUnescaper.translate and doesn't check if the value is bigger 
> than 0x10FFFF (MAX_CODE_POINT) that is a check inside Character.toChar.
> Maybe we need to check that the entity value is less than Char.MAX_CODE_POINT.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to