Mattias Gärtner  wrote / napísal(a):

The character sets in synedit are 'set of char', which means only 8bit.
So, I guess the patch tries to fix an ANSI codepage accented chars problem,
right?
The fix is probably useless on other codepages including UTF-8, right?

Not as such. The problem is two fold.

1. If we ignore encoding (eg: just work in ansi space), then the old style was simply plain wrong. It only allowed alpha (not num) chars, and worked on the principle of "what's not alpha, isn't a word".

2. If we also consider UTF-8 encoded content, then getting words by boundaries (eg: not-allowed chars) and not by allowed-chars means that as long as given boundaries and whitespaces are < 127 (which the default ones are), UTF-8 words will be parsed right, even if they contain special multibyte chars.

I'm not sure if #2 applies also to some other encoding.

Ales

Mattias

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


_________________________________________________________________
    To unsubscribe: mail [EMAIL PROTECTED] with
               "unsubscribe" as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to