[ 
https://issues.apache.org/jira/browse/LANG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584486#comment-14584486
 ] 

Aleksandr Bogush commented on LANG-1148:
----------------------------------------

[[email protected]], first of all thank you for the reply.

I agree with the backward compatibility point.

However, I stumbled upon this issue using this method in my work project when 
parsing emails. The thing was that some email clients inserted non-break 
whitespaces instead of usual ones in their letters. It cost me a few angry 
letters from users, so I suggest at least mentioning this issue in the 
documentation ([like JDK developers 
did|http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isWhitespace(char)]),
 because I had to dig into the source code and investigate what was wrong, so 
the proper docs would save me some time.

Anyway, now when you know about the issue, you can decide if the library needs 
new methods which consider non-break whitespaces.

> StringUtils.isBlank does not work correctly with strings containing 
> non-breakable whitespace characters
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-1148
>                 URL: https://issues.apache.org/jira/browse/LANG-1148
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>         Environment: Windows 8.1 x64 , Java 1.8, but can be reproduced in any 
> environment with an official Oracle JDK or JRE
>            Reporter: Aleksandr Bogush
>            Priority: Minor
>              Labels: test
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> isBlank uses java.lang.Character.isWhitespace(char ch) method, which has not 
> been changed for a long time for backward compatibility. Over the years 
> non-breakable whitespaces were introduced and are now used in some cases. So 
> if we execute the code
> {noformat}org.apache.commons.lang.StringUtils.isBlank("\u00A0"); //returns 
> false
> org.apache.commons.lang.StringUtils.isBlank("\u202F"); //returns false
> org.apache.commons.lang.StringUtils.isBlank("\u2007"); //returns 
> false{noformat}
> we will get 3 falses, which is not right, according to StringUtils.isBlank 
> documentation: {noformat}Checks if a String is whitespace, empty ("") or 
> null.{noformat}
> I suggest fixing it by using regex pattern {noformat}"^[\\p{Z}]*$"{noformat} 
> instead of looping over the string characters. I know that it is a bit less 
> fast than it works now, but it will work much more correctly. I would be glad 
> to do it myself and write unit tests for it, so if you want, please contact 
> me via email [email protected]
> Additionally, I would modify the documentation itself too, because it does 
> not tell that it returns true when meeting multiple whitespaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to