[ 
https://issues.apache.org/jira/browse/LANG-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801043#comment-13801043
 ] 

Henri Yandell commented on LANG-910:
------------------------------------

+1 to 'ing'.

On my less natural statement, let's use the example:

  assertEquals("ttle", StringUtils.substringMatching("two little pumpkins 
sitting over there", Pattern.compile("(....) ")));

Here the pattern finds four characters, followed by a space. So the first one 
it finds is the "ttle" on the end of little. 

So let's skip to the next word, just like we would with substring:

  assertEquals("kins", StringUtils.substringMatching("two little pumpkins 
sitting over there", Pattern.compile("(....) "), 1));

More fool us, because regex results are 1-indexed and not 0-indexed, the answer 
is "ttle" again.

Okay, so let's increment our start index:

  assertEquals("kins", StringUtils.substringMatching("two little pumpkins 
sitting over there", Pattern.compile("(....) "), 2));

This test fails. The result was <null> and not "kins".

This is because there is only one group in "(....)", regardless of how often it 
matches. So using the substringMatching version never allows us to get to 
"kins". 

Moving to substringsMatching would in that we would get an array back for 
either the undeclared index or the 1-index. Nothing else.

My concern is that this is confusing and not the simplicity a user would be 
aiming for with the code.

> Patch to extend StringUtils
> ---------------------------
>
>                 Key: LANG-910
>                 URL: https://issues.apache.org/jira/browse/LANG-910
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.1
>         Environment: Developed on Ubuntu 13.04 with openjdk 7u25
>            Reporter: Timur Yarosh
>              Labels: patch
>             Fix For: 3.2, Discussion
>
>         Attachments: LANG-910.patch, 
> substring-matches-and-white-space-normalize.patch
>
>
> This patch extends StringUtils capabilities: added methods to find 
> substring(s) by Pattern. Also method 
> org.apache.commons.lang3.StringUtils#normalizeSpace now replaces ASCII #160 
> char to normal whitespace.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to