Re: synedit patch from ales

Mattias Gaertner Fri, 25 Jan 2008 09:04:37 -0800

On Fri, 25 Jan 2008 17:25:07 +0100
Ales Katona <[EMAIL PROTECTED]> wrote:


> Mattias Gärtner  wrote / napísal(a):
> >
> > The character sets in synedit are 'set of char', which means only
> > 8bit. So, I guess the patch tries to fix an ANSI codepage accented
> > chars problem, right?
> > The fix is probably useless on other codepages including UTF-8,
> > right? 
> 
> Not as such. The problem is two fold.
> 
> 1. If we ignore encoding (eg: just work in ansi space), then the old 
> style was simply plain wrong. It only allowed alpha (not num) chars,
> and worked on the principle of "what's not alpha, isn't a word".

True. But at least it is reliable. 
For what codepages do the patch work and for what codepages does it
not work?
Maybe the set/check should be configurable. The IDE will eventually
only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test.


> 2. If we also consider UTF-8 encoded content, then getting words by 
> boundaries (eg: not-allowed chars) and not by allowed-chars means
> that as long as given boundaries and whitespaces are < 127 (which the
> default ones are), UTF-8 words will be parsed right, even if they
> contain special multibyte chars.
> 
> I'm not sure if #2 applies also to some other encoding.

UTF-8 uses #128..#255. #0..#127 is plain ASCII like most other
8-bit codepages.


Mattias

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: synedit patch from ales

Reply via email to