Re: synedit patch from ales

2008-01-27 Thread Felipe Monteiro de Carvalho
I just tested the patch and it doesn't fixes selecting utf-8 words on
synedit. This doesn't mean it isn't on the right direction. I don't
know what is missing, as from the description I would think this
should start working.

-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Felipe Monteiro de Carvalho
Another test line:

  Caption := 'éé';

Behaves like before patching.

-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Ales Katona

Felipe Monteiro de Carvalho  wrote / napísal(a):

I tested with this line (utf-8 encoded), specifically the last word:

  Application.Title:='Minha Aplicação';

Double clicking on the left part selects Aplica. Clicking on the
accented part does nothing and clicking on o selects only o and
puts the carret at the end of the line.

Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport

thanks,
  


What function does double clicking call? It might be possible I missed 
some. I'll test it on my spellchecker to see if it selects a whole 
word and if so then I'll try double-clicking. Thanks, I'll report soon..


Ales

_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Felipe Monteiro de Carvalho
Oh, and another detail. I am not 100% sure, but I think that changes
should be around ifdef SYN_LAZARUS

thanks,
-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Ales Katona

Felipe Monteiro de Carvalho  wrote / napísal(a):

I tested with this line (utf-8 encoded), specifically the last word:

  Application.Title:='Minha Aplicação';

Double clicking on the left part selects Aplica. Clicking on the
accented part does nothing and clicking on o selects only o and
puts the carret at the end of the line.

Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport

thanks,
  
I missed some function. If you use GetWordAtRowCol or 
GetWordBoundsAtRowCol or
NextWordPos functions, you get the whole word/boundaries. Doubleclick in 
synedit uses SetWordBlock (in lazarus/non-lineselect case) which I 
didn't look at (and it seems to try some ugly utf-8 conversion which I 
guess didn't work)


Ales

_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Ales Katona

Felipe Monteiro de Carvalho  wrote / napísal(a):

I just tested the patch and it doesn't fixes selecting utf-8 words on
synedit. This doesn't mean it isn't on the right direction. I don't
know what is missing, as from the description I would think this
should start working.

  
What words did you test? Can you send the test sample to me? I tested on 
some slovak accented (eg: 2byte chars) words and it worked perfectly.


Ales

_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-27 Thread Felipe Monteiro de Carvalho
On Jan 27, 2008 11:33 AM, Ales Katona [EMAIL PROTECTED] wrote:
 What words did you test? Can you send the test sample to me? I tested on
 some slovak accented (eg: 2byte chars) words and it worked perfectly.

I tested with this line (utf-8 encoded), specifically the last word:

  Application.Title:='Minha Aplicação';

Double clicking on the left part selects Aplica. Clicking on the
accented part does nothing and clicking on o selects only o and
puts the carret at the end of the line.

Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport

thanks,
-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Felipe Monteiro de Carvalho
Hi,

Please define what exactly does this patch fixes.

 The IDE will eventually
 only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test.

I commited a partial implementation for that around ifdef:

http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=revroot=lazarusrevision=13868

thanks,
-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Mattias Gaertner
On Fri, 25 Jan 2008 18:30:11 +0100
Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote:

 On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote:
  Yes, and I'm not 100% sure of what everything that would constitute
  (eg: I don't think there's a valid blockchar in multibyte range),
  but for 99% of usages the current blockchars (+ whitechars) which
  are  127 seem to be working fine.
 
 That's not enougth. It already works for ascii characters today.
 Please test with both unicode and non-unicode IDE on strings with
 accented characters.
 
 I am also working on that and it ain't that easy, I can tell for sure.
 
  P.S: I think synedit will need a lot more work to be 100% utf-8
  ready on all fronts. All the set of char things will have to go
  and we'd have to implement utf-8 utf8string[x]
  operations/functions (afaik fpc doesn't have them yet?)
 
 You mean like that:
 
 http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868

I think this is pretty slow and needs too much memory.
For example:
It increases the Dest array in steps of one while allocating one mem
block for each character.
Can you explain, what are you trying to achieve? Then we can find a
better solution.

Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Felipe Monteiro de Carvalho
On Jan 25, 2008 6:48 PM, Mattias Gaertner [EMAIL PROTECTED] wrote:
 I think this is pretty slow and needs too much memory.
 For example:
 It increases the Dest array in steps of one while allocating one mem
 block for each character.
 Can you explain, what are you trying to achieve? Then we can find a
 better solution.

I am trying to make the method SetWordBlock work for utf-8 (would then
do some check to alternate between current code and new code to work
on ansi too.

If you know any other way to make this work, please suggest. I thougth
a lot about this, but I didn't find any other solution.

My first idea was to generate the position of characters on the fly.
Then it was so slow one could notice synedit calculating it. I even
tryed to buffer to generate positions, which was also too slow.

-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Mattias Gaertner
On Fri, 25 Jan 2008 17:25:07 +0100
Ales Katona [EMAIL PROTECTED] wrote:

 Mattias Gärtner  wrote / napísal(a):
 
  The character sets in synedit are 'set of char', which means only
  8bit. So, I guess the patch tries to fix an ANSI codepage accented
  chars problem, right?
  The fix is probably useless on other codepages including UTF-8,
  right? 
 
 Not as such. The problem is two fold.
 
 1. If we ignore encoding (eg: just work in ansi space), then the old 
 style was simply plain wrong. It only allowed alpha (not num) chars,
 and worked on the principle of what's not alpha, isn't a word.

True. But at least it is reliable. 
For what codepages do the patch work and for what codepages does it
not work?
Maybe the set/check should be configurable. The IDE will eventually
only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test.


 2. If we also consider UTF-8 encoded content, then getting words by 
 boundaries (eg: not-allowed chars) and not by allowed-chars means
 that as long as given boundaries and whitespaces are  127 (which the
 default ones are), UTF-8 words will be parsed right, even if they
 contain special multibyte chars.
 
 I'm not sure if #2 applies also to some other encoding.

UTF-8 uses #128..#255. #0..#127 is plain ASCII like most other
8-bit codepages.


Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Ales Katona

Mattias Gärtner  wrote / napísal(a):


The character sets in synedit are 'set of char', which means only 8bit.
So, I guess the patch tries to fix an ANSI codepage accented chars problem,
right?
The fix is probably useless on other codepages including UTF-8, right?
  


Not as such. The problem is two fold.

1. If we ignore encoding (eg: just work in ansi space), then the old 
style was simply plain wrong. It only allowed alpha (not num) chars, and 
worked on the principle of what's not alpha, isn't a word.


2. If we also consider UTF-8 encoded content, then getting words by 
boundaries (eg: not-allowed chars) and not by allowed-chars means that 
as long as given boundaries and whitespaces are  127 (which the default 
ones are), UTF-8 words will be parsed right, even if they contain 
special multibyte chars.


I'm not sure if #2 applies also to some other encoding.

Ales


Mattias

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

  


_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Marc Weustink

Ales Katona wrote:

Felipe Monteiro de Carvalho  wrote / napísal(a):

On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote:
  That's not enougth. It already works for ascii characters today.
Please test with both unicode and non-unicode IDE on strings with
accented characters.
  
ASCII doesn't have accented chars. If you mean non-utf local non-latin1 
encoding then pre-patch doesn't work on those, anything out of 
['A'-'z'] is considered block before my patch.


With my patch, anything NOT listed in TSynWordBlockChars + 
TSynWhiteChars (if there's no highlighter), or 
Highlighter.WordBlockChars + TSynWhiteChars is considered a valid 
word-character (which MIGHT include some nonsense chars, but at least it 
doesn't block known word-chars, + it can be runtime adjusted (in a way 
to also support utf-8), unlike the current situation (where you simply 
cannot support utf-8 because allowed chars is a set of 8bit char).


I agree on Ales here. this afternoon Ales and I spoke while he was 
inmplementing this. The curretn handling of Synedit now is imo plain 
stupid and never will work when the carset is growing bigger than 255 
chars. Defining a set of nonword + whithespace chars is imo better.
If someone enters a nonsence char and it is considered a part of a word 
is imo a lot better than acception only lower ascii.


Marc

_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


[lazarus] synedit patch from ales

2008-01-25 Thread Henry Vermaak
http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1618

can someone look into why his mails don't reach the list, please?

thanks

henry

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Henry Vermaak
On 25/01/2008, Henry Vermaak [EMAIL PROTECTED] wrote:
 http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1618


from #lazarus-ide:

Almindor say it fixes word-parsing in synedit specially for accented
chars etc.

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Ales Katona

Felipe Monteiro de Carvalho  wrote / napísal(a):

On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote:
  
That's not enougth. It already works for ascii characters today.

Please test with both unicode and non-unicode IDE on strings with
accented characters.
  
ASCII doesn't have accented chars. If you mean non-utf local non-latin1 
encoding then pre-patch doesn't work on those, anything out of 
['A'-'z'] is considered block before my patch.


With my patch, anything NOT listed in TSynWordBlockChars + 
TSynWhiteChars (if there's no highlighter), or 
Highlighter.WordBlockChars + TSynWhiteChars is considered a valid 
word-character (which MIGHT include some nonsense chars, but at least it 
doesn't block known word-chars, + it can be runtime adjusted (in a way 
to also support utf-8), unlike the current situation (where you simply 
cannot support utf-8 because allowed chars is a set of 8bit char).

I am also working on that and it ain't that easy, I can tell for sure.
  
My solution isn't final, but you'd have to rewrite much more to get a 
full utf-8 synedit. As I said, I'm not trying to do that.

You mean like that:

http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868

?

  
No, that's no efficient. We'll need stuff on the fly eg, things which 
will report boundaries etc.



Ales

_
To unsubscribe: mail [EMAIL PROTECTED] with
   unsubscribe as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives


Re: synedit patch from ales

2008-01-25 Thread Felipe Monteiro de Carvalho
On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote:
 Yes, and I'm not 100% sure of what everything that would constitute (eg:
 I don't think there's a valid blockchar in multibyte range), but for 99%
 of usages the current blockchars (+ whitechars) which are  127 seem to
 be working fine.

That's not enougth. It already works for ascii characters today.
Please test with both unicode and non-unicode IDE on strings with
accented characters.

I am also working on that and it ain't that easy, I can tell for sure.

 P.S: I think synedit will need a lot more work to be 100% utf-8 ready
 on all fronts. All the set of char things will have to go and we'd
 have to implement utf-8 utf8string[x] operations/functions (afaik fpc
 doesn't have them yet?)

You mean like that:

http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868

?

-- 
Felipe Monteiro de Carvalho

_
 To unsubscribe: mail [EMAIL PROTECTED] with
unsubscribe as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives