[ 
https://issues.apache.org/jira/browse/XERCESC-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289994#comment-13289994
 ] 

Lee Doron commented on XERCESC-1984:
------------------------------------

The problem with "if(charsDone < len)" is that it increases (doubles) allocSize 
even if the transcoder exited for reasons other than running out of available 
output buffer space; that can happen if the input buffer ends with the leading 
character of a surrogate pair. Why increase it if there might be plenty of 
space?

I suggest changing the conditional to:

    if(charsDone < len && (allocSize - fBytesWritten) < 4)

This ensures that there are at least 4 bytes available, which is always enough 
to hold at least one more multi-byte character, so charsRead won't be 0 for 
lack of space. (I don't believe any encodings use more than 4 bytes for a 
character, right?)

Likewise, I'd change the corresponding conditional in 
TranscodeFromStr::transcode() from:

    if(((allocSize - fCharsWritten)*sizeof(XMLCh)) < (length - bytesDone))

to:

    if(bytesDone < length && (allocSize - fCharsWritten) < 2)

There's no reason to multiply by sizeof(XMLCh) here. However, we do need to 
make sure there's enough room for the largest representation we might get from 
a transcoder, which is 2 XMLCh entries (a surrogate pair).

These could be simplified slightly by moving the entire "if" blocks to the very 
beginning of each loop. At that point, we know that "charsDone < len" (or, 
respectively, "bytesDone < length"), and we can leave out the first clause of 
each conditional. It will always be skipped the first time through the loop.
                
> TranscodeToStr::transcode throws an exception when transcoding to UTF-8
> -----------------------------------------------------------------------
>
>                 Key: XERCESC-1984
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1984
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 3.2.0, 4.0.0
>         Environment: Bug reproducible on a Red Hat 5 based platform. The bug 
> doesn't seem to be platform specific though.
>            Reporter: Dan PV
>              Labels: exception, transcode
>         Attachments: transtest2.cpp
>
>
> This issue relates to the bug fix for issue XERCESC-1947. There are still 
> cases where the method will fail in providing a transcoded version without 
> throwing an exception. See the attached "transtest2.cpp" to reproduce the 
> issue.
> The cause seems to come from the added "if((allocSize - fBytesWritten) < (len 
> - charsDone))" condition in "TranscodeToStr::transcode" . In my provided test 
> case I have a string composed of 6 Japanese characters (i.e. "絞り込み検索"). Once 
> the first call to "XMLUTF8Transcoder::transcodeTo" is done, "charsRead" will 
> return a count of 5 XMLCh readed. Since the initial allocated buffer for this 
> string was set to 16 bytes, the condition will check against the following 
> values "if((16 - 15) < (6 - 5))" which avoids the reallocation of a larger 
> buffer for the UTF-8 encoded version of the string. 
> Since the reallocation doesn't take place, the code will recall 
> "XMLUTF8Transcoder::transcodeTo" but this time the "charsRead" count will be 
> set to 0 because there is insufficient space in the buffer and this will 
> trigger an exception of type "Trans_BadSrcSeq".
> I suppose that the goal of this added condition was to avoid an unnecessary 
> reallocation of a buffer but unfortunately it doesn’t work when transcoding 
> to variable length encoding like UTF-8. The solution is probably to simply 
> replace the condition with "if(charsDone < len)".
> Regards,
> Dan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to