Hi Amit,
thanks for spotting this; even if the code was not going to crash (as the first character after the authority is still in the allocated memory and must be a / or a NULL, and both would have failed the isHex test, avoiding that the second isHex would access memory after the end of the string) I added the extra check to make it clear.

Thanks,
Alberto

Il 19/07/2012 23:00, Amit K ha scritto:
Hi,


I am using the C++ distribution of xerces 2.7, the implementation of
the function in question in it is:


bool XMLUri::isValidRegistryBasedAuthority(const XMLCh* const authority,

                                            const int authLen)
{
     // check authority    int index = 0;
     while (index < authLen)
     {
         if (isUnreservedCharacter(authority[index]) ||
             (XMLString::indexOf(REG_NAME_CHARACTERS, authority[index]) != -1))
         {
             index++;
         }
         else if (authority[index] == chPercent)               // '%'        {
             *if (XMLString::isHex(authority[index+1]) &&     // 1st
hex                XMLString::isHex(authority[index+2])  )     // 2nd
hex                index +=3;*
             else
                 return false;
         }
         else
             return false;
     } //while
     return true;
}


I've boldened the lines which I want to further discuss here.

and also note that I've seen that the implementation is the same in
the latest version of xerces.

However, I noticed that the Java implementation is different in
relation to the same lines:

/**
   +   * Determines whether the given string is a registry based authority.
   +   *
   +   * @param authority the authority component of a URI
   +   *
   +   * @return true if the given string is a registry based authority
   +   */
   +  private boolean isValidRegistryBasedAuthority(String authority) {
   +    int index = 0;
   +    int end = authority.length();
   +    char testChar;
   +    
   +    while (index < end) {
   +      testChar = authority.charAt(index);
   +
   +      // check for valid escape sequence
   +      if (testChar == '%') {
   +        *if (index+2 >= end ||
   +            !isHex(authority.charAt(index+1)) ||
   +            !isHex(authority.charAt(index+2))) {*
   +            return false;
   +        }
   +        index += 2;
   +      }
   +      // can check against path characters because the set
   +      // is the same except for '/' which we've already excluded.
   +      else if (!isPathCharacter(testChar)) {
   +        return false;
   +      }
   +      ++index;
   +    }
   +    return true;
   +  }


The important difference is of course, the bounds check on the string
/ character array. Why is it omitted in the C++ version ?

I thought it might be because the string can be assume to be null
terminated in the C++ version.. but I can't be sure whether it's just
a bug or not.

I would thank your reply


Sincerely,

Amit
.


Reply via email to