On Fri, 25 Jan 2008 17:25:07 +0100
Ales Katona <[EMAIL PROTECTED]> wrote:
> Mattias Gärtner wrote / napísal(a):
> >
> > The character sets in synedit are 'set of char', which means only
> > 8bit. So, I guess the patch tries to fix an ANSI codepage accented
> > chars problem, right?
> > The fix is probably useless on other codepages including UTF-8,
> > right?
>
> Not as such. The problem is two fold.
>
> 1. If we ignore encoding (eg: just work in ansi space), then the old
> style was simply plain wrong. It only allowed alpha (not num) chars,
> and worked on the principle of "what's not alpha, isn't a word".
True. But at least it is reliable.
For what codepages do the patch work and for what codepages does it
not work?
Maybe the set/check should be configurable. The IDE will eventually
only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test.
> 2. If we also consider UTF-8 encoded content, then getting words by
> boundaries (eg: not-allowed chars) and not by allowed-chars means
> that as long as given boundaries and whitespaces are < 127 (which the
> default ones are), UTF-8 words will be parsed right, even if they
> contain special multibyte chars.
>
> I'm not sure if #2 applies also to some other encoding.
UTF-8 uses #128..#255. #0..#127 is plain ASCII like most other
8-bit codepages.
Mattias
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives