Simon White created XERCESC-1987:
------------------------------------

             Summary: Transcoding Issue with single XMLCh to utf8
                 Key: XERCESC-1987
                 URL: https://issues.apache.org/jira/browse/XERCESC-1987
             Project: Xerces-C++
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 2.8.0
         Environment: Windows XP
            Reporter: Simon White


There appears to be an issue with transcoding to utf8.  Conditions:

Input string = Single Chinese Character (XmlCh holds value 27493).

Problem code in TranscodeToStr::transcode:

    unsigned int allocSize = len * sizeof(XMLCh);
    fString = (XMLByte*)fMemoryManager->allocate(allocSize);

This code sets the output buffer to be two bytes.  The issue here is that the 
character in question converts to a 3 byte utf8 character.  It therefore hits 
this in XMLUTF8Transcoder.cpp:

        //  If we cannot fully get this char into the output buffer,
        //  then leave it for the next time.
        //
        if (outPtr + encodedBytes > outEnd)
            break;

Since this is only a single character being converted it returns 0 and then 
hits this since nothing could be decoded:

        if(charsRead == 0)
            ThrowXMLwithMemMgr(TranscodingException, 
XMLExcepts::Trans_BadSrcSeq, fMemoryManager);

The sequence is not invalid, only output buffer has been limited to input 
buffer size.  Is simply adding a few spare characters to allocSize the correct 
fix?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to