[
https://issues.apache.org/jira/browse/LANG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duncan Jones updated LANG-806:
------------------------------
Description:
An infinite loop can result if the selection process never returns a char that
passes the validation test.
This can occur if the subset specified by the start and end characters does not
contain any valid characters.
For example:
{code:java}
RandomStringUtils.random(3, 5, 10, true, true); // 1
RandomStringUtils.random(3, 56192, 56319, false, false); // 2
{code}
There's also the case where only surrogates are allowed, but the buffer is not
an even number of characters, for example:
{code:java}
RandomStringUtils.random(3, 56320, 57343, false, false); // 3
{code}
The second example is easy to detect, but in general it does not seem easy to
determine in advance if the subset contains any valid characters - except by
evaluating all the possible char values. This would be expensive if the subset
range is large.
One possibility is to count the total number of loops (or retries), and throw
an error if it exceeds a given value. Or count the number of consecutive
retries.
In both cases the threshold value must be set high enough to allow for the
cases where the allowable char range contains only a small proportion of valid
characters.
In the case of digits only, the default allowable range is currently set to
digits + letters, so the proportion of valid chars is 10/90 i.e. approx 11%.
A minimum proportion of 1% or 0.1% would be necessary to reduce the number of
false positives.
was:
An infinite loop can result if the selection process never returns a char that
passes the validation test.
This can occur if the subset specified by the start and end characters does not
contain any valid characters.
For example:
RandomStringUtils.random(3, 5, 10, true, true); // 1
RandomStringUtils.random(3, 56192, 56319, false, false); // 2
There's also the case where only surrogates are allowed, but the buffer is not
an even number of characters, for example:
RandomStringUtils.random(3, 56320, 57343, false, false); // 3
The second example is easy to detect, but in general it does not seem easy to
determine in advance if the subset contains any valid characters - except by
evaluating all the possible char values. This would be expensive if the subset
range is large.
One possibility is to count the total number of loops (or retries), and throw
an error if it exceeds a given value. Or count the number of consecutive
retries.
In both cases the threshold value must be set high enough to allow for the
cases where the allowable char range contains only a small proportion of valid
characters.
In the case of digits only, the default allowable range is currently set to
digits + letters, so the proportion of valid chars is 10/90 i.e. approx 11%.
A minimum proportion of 1% or 0.1% would be necessary to reduce the number of
false positives.
> RandomStringUtils can enter infinite loop if chosen char does not meet
> letter/digit requirements
> ------------------------------------------------------------------------------------------------
>
> Key: LANG-806
> URL: https://issues.apache.org/jira/browse/LANG-806
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.6, 3.1
> Reporter: Sebb
> Fix For: Review Patch
>
> Attachments: LANG-806.patch, RandomStringException.java
>
>
> An infinite loop can result if the selection process never returns a char
> that passes the validation test.
> This can occur if the subset specified by the start and end characters does
> not contain any valid characters.
> For example:
> {code:java}
> RandomStringUtils.random(3, 5, 10, true, true); // 1
> RandomStringUtils.random(3, 56192, 56319, false, false); // 2
> {code}
> There's also the case where only surrogates are allowed, but the buffer is
> not an even number of characters, for example:
> {code:java}
> RandomStringUtils.random(3, 56320, 57343, false, false); // 3
> {code}
> The second example is easy to detect, but in general it does not seem easy to
> determine in advance if the subset contains any valid characters - except by
> evaluating all the possible char values. This would be expensive if the
> subset range is large.
> One possibility is to count the total number of loops (or retries), and throw
> an error if it exceeds a given value. Or count the number of consecutive
> retries.
> In both cases the threshold value must be set high enough to allow for the
> cases where the allowable char range contains only a small proportion of
> valid characters.
> In the case of digits only, the default allowable range is currently set to
> digits + letters, so the proportion of valid chars is 10/90 i.e. approx 11%.
> A minimum proportion of 1% or 0.1% would be necessary to reduce the number of
> false positives.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)