PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL BE LOST SOMEWHERE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3273 *** shadow/3273 Sat Aug 25 23:38:56 2001 --- shadow/3273.tmp.12103 Sat Aug 25 23:38:56 2001 *************** *** 0 **** --- 1,57 ---- + +============================================================================+ + | CharacterArrayCharacterIterator substring function returns incorrect resul | + +----------------------------------------------------------------------------+ + | Bug #: 3273 Product: Regexp | + | Status: NEW Version: unspecified | + | Resolution: Platform: All | + | Severity: Normal OS/Version: All | + | Priority: Other Component: Other | + +----------------------------------------------------------------------------+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + | CC list: Cc: | + +----------------------------------------------------------------------------+ + | URL: .../api/org/apache/regexp/CharacterArrayCharacterIterator.ht | + +============================================================================+ + | DESCRIPTION | + Using the RE.match(CharacterIterator,int) function + with a "CharacterArrayCharacterIterator", then calling "getParen(int)" + often returns a string of the incorrect length, or throws an exception. + + This is due to the implementation of "substring(int,int)" in the + CharacterArrayCharacterIterator class and/or the mis-documentation of + the CharacterIterator.substring interface. + + The confusion is in whether the second argument to substring represents + the endIndex or the length. The API docs say it's the length, but the + RE implementation, and the StringCharacterIterator implementation both + treat it as the endIndex. + [Note, the standard java string has, + java.lang.String.substring(int beginIndex, int endIndex) + but the constructor is java.lang.String(char[] src, int off, int len)] + + Secondly, there is no check that the requested substring stays within the + bounds of the sequence length specified at construction time. + An IndexOutOfBoundsException should be thrown in that case. + + I think the best solution is to first update the API docs to specify + that it is infact (beginIndex, endIndex), and then to update the + CharacterArrayCharacterIterator.substring functions to be something like this: + + public String substring(int beginIndex, int endIndex) + { + if (endIndex > len) + throw new IndexOutOfBoundsException("endIndex=" + endIndex + + "; sequence size=" + len); + if (beginIndex < 0) + throw new IndexOutOfBoundsException("beginIndex=" + beginIndex); + return new String(src, off + beginIndex, endIndex - beginIndex); + } + + public String substring(int beginIndex) + { + if (beginIndex > len) + throw new IndexOutOfBoundsException("index=" + beginIndex + + "; sequence size=" + len); + return new String(src, off + beginIndex, len - beginIndex); + }
