[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...

rikles Sat, 16 May 2015 05:30:53 -0700

Github user rikles commented on a diff in the pull request:

    https://github.com/apache/commons-lang/pull/75#discussion_r30460736
  
    --- Diff: src/main/java/org/apache/commons/lang3/StringUtils.java ---
    @@ -3277,6 +3277,164 @@ public static String substringBetween(final String 
str, final String open, final
             return list.toArray(new String[list.size()]);
         }
     
    +    /**
    +     * <p>Split a String into an array, using an array of fixed string 
lengths.</p>
    +     *
    +     * <p>If not null String input, the returned array size is same as the 
input lengths array.</p>
    +     *
    +     * <p>A null input String returns {@code null}.
    +     * A {@code null} or empty input lengths array returns an empty array.
    +     * A {@code 0} in the input lengths array results in en empty 
string.</p>
    +     *
    +     * <p>Extra characters are ignored (ie String length greater than sum 
of split lengths).
    +     * All empty substrings other than zero length requested, are returned 
{@code null}.</p>
    +     *
    +     * <pre>
    +     * StringUtils.splitByLength(null, *)      = null
    +     * StringUtils.splitByLength("abc")        = []
    +     * StringUtils.splitByLength("abc", null)  = []
    +     * StringUtils.splitByLength("abc", [])    = []
    +     * StringUtils.splitByLength("", 2, 4, 1)  = [null, null, null]
    +     *
    +     * StringUtils.splitByLength("abcdefg", 2, 4, 1)     = ["ab", "cdef", 
"g"]
    --- End diff --
    
    Like said in the next line : `StringUtils.splitByLength("abcdefg", 2, 2)` 
will return `["ab", "cd" ]`.
    `StringUtils.splitByLength("abcdefghij", 2, 4, 1)  = ["ab", "cdef", "g"]`
    
    I asked myself the question during development. Do we discard the extra 
characters ?
    I think it would be nice to let users decide. Moreover, depending on use 
case, it could be useful to keep/discard the "first extra characters" (like 
parsing a single line commented out string).
    
    I propose to :
      * add a private `splitByLengthWorker(String string, boolean splitFromEnd, 
boolean discardExtraChar, int ... lengths)`
      * keep this `splitByLength(String, int ...)` method logic as default  : 
`return splitByLengthWorker(string, false, true, lengths)`. So, by default, the 
returned array is same size as the `int ... lengths` array param and this 
behavior is interesting on parsing "fixed column lengths" strings.
      * add a `splitByLengthKeepExtraChar(String, int ...)` : `return 
splitByLengthWorker(string, false, false, lengths)`
      * add a `splitByLengthFromEnd(String, int ...)` : `return 
splitByLengthWorker(string, true, false, lengths)`
      * add a `splitByLengthFromEndKeepExtraChar(String, int ...)` : `return 
splitByLengthWorker(string, true, true, lengths)`
    
    A question : For _split from end_ methods, which call do you think is more 
logic : _right aligned/end to start_ lengths, _reversed/not reversed_ result ?
      * `StringUtils.splitByLengthFromEndKeepExtraChar("__abcdef", 1, 2, 3)  = 
["__", "a", "bc", "def"]` - (RA, NR)
      * `StringUtils.splitByLengthFromEndKeepExtraChar("__abcdef", 1, 2, 3)  = 
["def", "bc", "a", "__"]` - (RA, R)
      * `StringUtils.splitByLengthFromEndKeepExtraChar("__abcdef", 1, 2, 3)  = 
["f", "de", "abc", "__"]` - (E2S, R)
      * `StringUtils.splitByLengthFromEndKeepExtraChar("__abcdef", 1, 2, 3)  = 
["__", "abc", "de", "f"]` - (E2S, NR)
    
    I think the first one is more readable, we can visually understand the 
splitting, but may be less intuitive :
    ```
    StringUtils.splitByLengthFromEnd("ABCDEFGHIJKLM", 3, 4, 5)  = ["BCD", 
"EFGH", "IJKLM"]
     [3][4_][_5_]
    ABCDEFGHIJKLM
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang pull request: LANG-1124: Add StringUtils split by len...

Reply via email to