Hi Amit,
thanks for spotting this; even if the code was not going to crash (as
the first character after the authority is still in the allocated memory
and must be a / or a NULL, and both would have failed the isHex test,
avoiding that the second isHex would access memory after the end of the
string) I added the extra check to make it clear.
Thanks,
Alberto
Il 19/07/2012 23:00, Amit K ha scritto:
Hi,
I am using the C++ distribution of xerces 2.7, the implementation of
the function in question in it is:
bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,
const int authLen)
{
// check authority int index = 0;
while (index < authLen)
{
if (isUnreservedCharacter(authority[index]) ||
(XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
{
index++;
}
else if (authority[index] == chPercent) // '%' {
*if (XMLString::isHex(authority[index+1]) && // 1st
hex XMLString::isHex(authority[index+2]) ) // 2nd
hex index +=3;*
else
return false;
}
else
return false;
} //while
return true;
}
I've boldened the lines which I want to further discuss here.
and also note that I've seen that the implementation is the same in
the latest version of xerces.
However, I noticed that the Java implementation is different in
relation to the same lines:
/**
+ * Determines whether the given string is a registry based authority.
+ *
+ * @param authority the authority component of a URI
+ *
+ * @return true if the given string is a registry based authority
+ */
+ private boolean isValidRegistryBasedAuthority(String authority) {
+ int index = 0;
+ int end = authority.length();
+ char testChar;
+
+ while (index < end) {
+ testChar = authority.charAt(index);
+
+ // check for valid escape sequence
+ if (testChar == '%') {
+ *if (index+2 >= end ||
+ !isHex(authority.charAt(index+1)) ||
+ !isHex(authority.charAt(index+2))) {*
+ return false;
+ }
+ index += 2;
+ }
+ // can check against path characters because the set
+ // is the same except for '/' which we've already excluded.
+ else if (!isPathCharacter(testChar)) {
+ return false;
+ }
+ ++index;
+ }
+ return true;
+ }
The important difference is of course, the bounds check on the string
/ character array. Why is it omitted in the C++ version ?
I thought it might be because the string can be assume to be null
terminated in the C++ version.. but I can't be sure whether it's just
a bug or not.
I would thank your reply
Sincerely,
Amit
.