garydgregory commented on a change in pull request #751:
URL: https://github.com/apache/commons-lang/pull/751#discussion_r631012728
##########
File path: src/main/java/org/apache/commons/lang3/StringUtils.java
##########
@@ -9638,6 +9638,79 @@ public static String wrapIfMissing(final String str,
final String wrapWith) {
return builder.toString();
}
+ /**
+ * Method that assembles all the numbers, form the passed string and
returns them as list.
+ * It is important to note here, is that bu 'number' method assume any
digit sequence, that
+ * can (but not necessary at all) contains dot within it (I mean just
plain old floats,
+ * something like 51.82)
+ *
+ * For example, you may pass a string "21.2 days 3 minutes 22 seconds".
For this particular string
+ * the result list of doubles will look like this : [21.2, 3.0, 22.0]
+ *
+ * if string contains invalid numbers (for example this string contains
+ * not valid number: "My height is 1234.23.13" This is invalid because it
+ * is not clear how to parse this part - 1234.23.13), {@link
NumberFormatException}
+ * will be thrown. Though if string will contain number, where right
+ * after second dot resides not a number, or, any other char, then this
+ * case will be considered as valid. For example, this string contains
+ * only valid numbers: "My pulse is 90.123. and weight is 78.2"
+ * In this case sequence "90.123." will be considered as "90.123", as well
as
+ * sequence "90." (imagine that there is no digit right after dot) will be
+ * considered as 90.0 double.
+ *
+ * @param stringThatContainsNumbers - string, that contains number or
several numbers.
+ * Not necessary integers, may be numbers
with float point.
+ * @return - list of numbers, that this particular string contains
+ *
+ * @throws NumberFormatException - see documentation clarification about
cases when thrown above
+ */
+ public static List<Double> extractNumbersFromString(String
stringThatContainsNumbers) {
+ boolean hasDigitAlreadyStarted = false;
+ boolean alreadyMetDotInThisNumber = false;
+
+ List<Double> resultList = new ArrayList<>();
+
+ StringBuilder currentNumberAsStringBuilder = new StringBuilder("");
+
+ for (int i = 0; i < stringThatContainsNumbers.length(); i++) {
+ char currentSymbol = stringThatContainsNumbers.charAt(i);
+ if (Character.isDigit(currentSymbol)) {
+ if (!hasDigitAlreadyStarted) {
+ hasDigitAlreadyStarted = true;
+ }
+ currentNumberAsStringBuilder.append(currentSymbol);
+ continue;
+ } else if (currentSymbol == '.') {
Review comment:
You can implement an API to do that if you want but my main point is
that the code should run with inputs from any locale which means handling
period and commas as both the decimal and thousands separators. But, the other
point I was attempting to make is that I do not feel this code belongs in
Commons Lang, it feels too much like NLP code to me. It might be something for
Commons Text, but it seems quite a specific use case, too much like NLP, not
generic enough for a Commons library. I encourage others in the community to
opine. The NLP nature makes me wonder how you would handle input like "I 'd
like 10,000 nuts, 9,000 bolts, but only up to $10.5." and "Send $4.50.5 apples
too please."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]