[ 
https://issues.apache.org/jira/browse/DERBY-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500527
 ] 

Rick Hillegas commented on DERBY-2694:
--------------------------------------

Hi, Anurag. I think that the patch does the right thing. However, it's a little 
tricky to read. I think that the following approach is easier to understand. 
What do you think? I'm not an expert on utf-8 encoding, but the following web 
page was useful to me: http://www.unix.org.ua/orelly/java/fclass/appb_01.htm

private static final byte MULTI_BYTE_MASK = (byte) 0xC0;
private static final byte CONTINUATION_BYTE = (byte) 0x80;

if (writeLen != origLen) // if we're truncating the string
{
    while ( isContinuationChar( byteval[ writeLen ] ) ) { writeLen--; }

   //
   // Now byteval[ writeLen ] is either a standalone 1-byte char
   // or the first byte of a multi-byte character. That means that
   // byteval[ writeLen -1 ] is the last (perhaps only) byte of the
   // previous character.
   //
}

private boolean isContinuationChar( byte b )
{    
    return ( (b & MULTI_BYTE_MASK) == CONTINUATION_BYTE );
}

> org.apache.derby.impl.drda.DDMWriter uses wrong algorithm to avoid spliting 
> varchar in the middle of a multibyte char.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2694
>                 URL: https://issues.apache.org/jira/browse/DERBY-2694
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>         Environment: all
>            Reporter: Anurag Shekhar
>            Assignee: Anurag Shekhar
>             Fix For: 10.3.0.0
>
>         Attachments: derby-2694-v2.diff, derby-2694.diff, TestProc.java, 
> TestProc_TruncateRep.java
>
>
> org.apache.derby.impl.drda.DDMWriter uses wrong algorithm to avoid splitting 
> varchar in the middle of a multibyte char.
> When DMWriter finds that it has to split a varchar while sending it to client 
> it checks if the last byte is a part of a multibyte char and in case it is it 
> tries to find the last byte of previous char and sends only till that byte 
> leaving rest of it for the next send.
> The code it uses is having a bug so it fails when the last byte its checking 
> for is the third byte of a char of 3 byte length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to