Re: [Bug-readline] rl_point, multibyte strings, and the cursor position

Ulf Magnusson Tue, 17 Feb 2015 14:21:12 -0800

On Tue, Feb 17, 2015 at 9:30 PM, Ulf Magnusson <[email protected]> wrote:
> On Tue, Feb 17, 2015 at 9:28 PM, Ulf Magnusson <[email protected]> wrote:
>> Thanks for the feedback!
>>
>> The -1 comparison should be safe in practice on non-exotic systems
>> where the size (rank) of size_t is at least that of int, but yeah,
>> it's kinda pointless and stupid to leave out the cast.
>>
>> I think I'll roll the mbrtowc -2 case into the error case as wc_len <
>> 0 for now. It'd be weird MB_CUR_MAX gave -2, but it's worth checking
>> for at least.
>>
>> I added control character handling by doing the following btw:
>>
>> width += iswcntrl(wc) ? 2 : max(0, wcwidth(wc));
>>
>> Guess that might catch more characters than it should though.
>>
>> I also noticed that readline outputs things like "~Z" for some (meta?)
>> characters. Might want to get back to that later...
>>
>> /Ulf
>
> (Excuse the top-posting by the way. Gmail keeps tripping me up. :P)
>
> /Ulf


(wc_len < 0 would not work of course. What I really meant was to handle
the -2 case the same as the -1 case.)

/Ulf

>
>>
>> On Tue, Feb 17, 2015 at 5:44 PM, Chet Ramey <[email protected]> wrote:
>>> On 2/16/15 4:52 PM, Ulf Magnusson wrote:
>>>> On Mon, Feb 16, 2015 at 4:43 PM, Ulf Magnusson <[email protected]> wrote:
>>>>> I'll try it. Thanks for the suggestion!
>>>>>
>>>>> /Ulf
>>>>>
>>>>
>>>> Here's what I came up with in case someone else runs into the same
>>>> problem. I'm sure there's more stuff to handle (not sure what to do
>>>> for non-printable characters for example), but it seems to handle
>>>> multibyte (tested using åäö's and Chinese) and combining characters
>>>> correctly for UTF-8 at least:
>>>
>>> This is basically what an implementation of wcswidth looks like.  A couple
>>> of suggestions:
>>>
>>>> // Returns the total width (in columns) of the characters in the 'n'-byte
>>>> // prefix of the null-terminated multibyte string 's'. If 'n' is larger 
>>>> than
>>>> // 's', returns the total width of the string. Suitable for calculating a
>>>> // cursor position.
>>>> //
>>>> // Makes a guess for malformed strings.
>>>> static size_t strnwidth(const char *s, size_t n) {
>>>>     mbstate_t shift_state;
>>>>     wchar_t wc;
>>>>     size_t wc_len;
>>>>     size_t width = 0;
>>>>
>>>>     // Start in the initial shift state.
>>>>     memset(&shift_state, '\0', sizeof shift_state);
>>>>
>>>>     for (size_t i = 0; i < n; i += wc_len) {
>>>>         // Extract the next multibyte character.
>>>>         wc_len = mbrtowc(&wc, s + i, MB_CUR_MAX, &shift_state);
>>>>         if (wc_len == 0)
>>>>             // Reached the end of the string.
>>>>             break;
>>>>         if (wc_len == -1)
>>>
>>> wc_len is a size_t, which is usually unsigned.  You need to cast the -1
>>> to (size_t)-1.  You also need to handle mbrtowc returning (size_t)-2.
>>>
>>>
>>> --
>>> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>>>                  ``Ars longa, vita brevis'' - Hippocrates
>>> Chet Ramey, ITS, CWRU    [email protected]    http://cnswww.cns.cwru.edu/~chet/

_______________________________________________
Bug-readline mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-readline

Re: [Bug-readline] rl_point, multibyte strings, and the cursor position

Reply via email to