On 04/28/2010 01:36 AM, Duncan Gibson wrote:

> The first problem here is that the various macros to handle extended
> mappings, ie. ERRORS_TO_ISO8859_1, ERRORS_TO_CP1252 and STRICT_RFC3629
> only apply to the fl_utf8decode() function in fl_utf.c [from FLTK2 ?]
> The other functions there, e.g. fl_utf8fwd() and fl_utf8back() assume
> they have true utf-8 sequences only, and I don't think they will handle
> isolated CP1252 0x80-0x9f characters properly. But I need to check this.

No matter how the ERRORS_TO_ macros are set, they all agree that the 
length of an error is 1 byte, so it does not effect the movement commands.

"back" works by moving back up to 3 bytes trying to find a "lead" byte 
for UTF-8. It then checks if this points at a legal UTF-8 character and 
that it's length goes past the original location. If this fails it 
assumes the current byte is an error and returns the pointer unchanged.

Thus the back function actually moves the pointer to the start of the 
current character.

Numerous tests were done with random byte streams to confirm that back 
always moved to the same boundary that the forward function does.

> The second problem is that fl_utf.c isn't the only source of functions.
> There's also O'ksi'D's fl_utf8.cxx implementation, which includes the
> fl_utf8len(char c) function. As Bill pointed out, fl_utf8len() does
> not have the full context to determine whether a byte is a CP1252
> 0x80-0x9f byte or a utf-8 trailing byte. According to fl_utf8len()
> the length of a utf-8 trailing byte is -1. That's the issue here.
>
> The third problem: the Fl_Text_* code not only uses fl_utf8len() but
> also does a lot of its own bit testing against 0x80 and 0x40 masks,
> which muddies the waters rather a lot.

All the above code has to be removed to get the text editor working.

> And finally, the Fl_Text_* code is also doing some "smart" expansion
> of specific characters, tab to spaces, 0x01-0x1f and DEL 0x7f to
> readable mnemonic forms, and then trying to handle top-bit "utf-8"
> characters using fl_utf8len(). There's no testing for the CP1252
> 0x80-0x9f characters first, and the ERRORS_TO_CP1252 macro is not
> defined in these files anyway.

Such testing should not be necessary, as fl_utf8len() will return 1 for 
the CP1252 characters.

> And finally plus one :-) I haven't checked any of the text handling
> in any of the other widgets at all.

The one-line editor also needs work.
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

Reply via email to