Re: synedit patch from ales
I just tested the patch and it doesn't fixes selecting utf-8 words on synedit. This doesn't mean it isn't on the right direction. I don't know what is missing, as from the description I would think this should start working. -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Another test line: Caption := 'éé'; Behaves like before patching. -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Felipe Monteiro de Carvalho wrote / napísal(a): I tested with this line (utf-8 encoded), specifically the last word: Application.Title:='Minha Aplicação'; Double clicking on the left part selects Aplica. Clicking on the accented part does nothing and clicking on o selects only o and puts the carret at the end of the line. Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport thanks, What function does double clicking call? It might be possible I missed some. I'll test it on my spellchecker to see if it selects a whole word and if so then I'll try double-clicking. Thanks, I'll report soon.. Ales _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Oh, and another detail. I am not 100% sure, but I think that changes should be around ifdef SYN_LAZARUS thanks, -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Felipe Monteiro de Carvalho wrote / napísal(a): I tested with this line (utf-8 encoded), specifically the last word: Application.Title:='Minha Aplicação'; Double clicking on the left part selects Aplica. Clicking on the accented part does nothing and clicking on o selects only o and puts the carret at the end of the line. Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport thanks, I missed some function. If you use GetWordAtRowCol or GetWordBoundsAtRowCol or NextWordPos functions, you get the whole word/boundaries. Doubleclick in synedit uses SetWordBlock (in lazarus/non-lineselect case) which I didn't look at (and it seems to try some ugly utf-8 conversion which I guess didn't work) Ales _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Felipe Monteiro de Carvalho wrote / napísal(a): I just tested the patch and it doesn't fixes selecting utf-8 words on synedit. This doesn't mean it isn't on the right direction. I don't know what is missing, as from the description I would think this should start working. What words did you test? Can you send the test sample to me? I tested on some slovak accented (eg: 2byte chars) words and it worked perfectly. Ales _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On Jan 27, 2008 11:33 AM, Ales Katona [EMAIL PROTECTED] wrote: What words did you test? Can you send the test sample to me? I tested on some slovak accented (eg: 2byte chars) words and it worked perfectly. I tested with this line (utf-8 encoded), specifically the last word: Application.Title:='Minha Aplicação'; Double clicking on the left part selects Aplica. Clicking on the accented part does nothing and clicking on o selects only o and puts the carret at the end of the line. Lazarus is compiled with build+clean and with the option -dWindowsUnicodeSupport thanks, -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Hi, Please define what exactly does this patch fixes. The IDE will eventually only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test. I commited a partial implementation for that around ifdef: http://svn.freepascal.org/cgi-bin/viewvc.cgi?view=revroot=lazarusrevision=13868 thanks, -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On Fri, 25 Jan 2008 18:30:11 +0100 Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote: On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote: Yes, and I'm not 100% sure of what everything that would constitute (eg: I don't think there's a valid blockchar in multibyte range), but for 99% of usages the current blockchars (+ whitechars) which are 127 seem to be working fine. That's not enougth. It already works for ascii characters today. Please test with both unicode and non-unicode IDE on strings with accented characters. I am also working on that and it ain't that easy, I can tell for sure. P.S: I think synedit will need a lot more work to be 100% utf-8 ready on all fronts. All the set of char things will have to go and we'd have to implement utf-8 utf8string[x] operations/functions (afaik fpc doesn't have them yet?) You mean like that: http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868 I think this is pretty slow and needs too much memory. For example: It increases the Dest array in steps of one while allocating one mem block for each character. Can you explain, what are you trying to achieve? Then we can find a better solution. Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On Jan 25, 2008 6:48 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: I think this is pretty slow and needs too much memory. For example: It increases the Dest array in steps of one while allocating one mem block for each character. Can you explain, what are you trying to achieve? Then we can find a better solution. I am trying to make the method SetWordBlock work for utf-8 (would then do some check to alternate between current code and new code to work on ansi too. If you know any other way to make this work, please suggest. I thougth a lot about this, but I didn't find any other solution. My first idea was to generate the position of characters on the fly. Then it was so slow one could notice synedit calculating it. I even tryed to buffer to generate positions, which was also too slow. -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On Fri, 25 Jan 2008 17:25:07 +0100 Ales Katona [EMAIL PROTECTED] wrote: Mattias Gärtner wrote / napísal(a): The character sets in synedit are 'set of char', which means only 8bit. So, I guess the patch tries to fix an ANSI codepage accented chars problem, right? The fix is probably useless on other codepages including UTF-8, right? Not as such. The problem is two fold. 1. If we ignore encoding (eg: just work in ansi space), then the old style was simply plain wrong. It only allowed alpha (not num) chars, and worked on the principle of what's not alpha, isn't a word. True. But at least it is reliable. For what codepages do the patch work and for what codepages does it not work? Maybe the set/check should be configurable. The IDE will eventually only pass UTF-8 to synedit. Then we need an UTF-8 word boundary test. 2. If we also consider UTF-8 encoded content, then getting words by boundaries (eg: not-allowed chars) and not by allowed-chars means that as long as given boundaries and whitespaces are 127 (which the default ones are), UTF-8 words will be parsed right, even if they contain special multibyte chars. I'm not sure if #2 applies also to some other encoding. UTF-8 uses #128..#255. #0..#127 is plain ASCII like most other 8-bit codepages. Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Mattias Gärtner wrote / napísal(a): The character sets in synedit are 'set of char', which means only 8bit. So, I guess the patch tries to fix an ANSI codepage accented chars problem, right? The fix is probably useless on other codepages including UTF-8, right? Not as such. The problem is two fold. 1. If we ignore encoding (eg: just work in ansi space), then the old style was simply plain wrong. It only allowed alpha (not num) chars, and worked on the principle of what's not alpha, isn't a word. 2. If we also consider UTF-8 encoded content, then getting words by boundaries (eg: not-allowed chars) and not by allowed-chars means that as long as given boundaries and whitespaces are 127 (which the default ones are), UTF-8 words will be parsed right, even if they contain special multibyte chars. I'm not sure if #2 applies also to some other encoding. Ales Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Ales Katona wrote: Felipe Monteiro de Carvalho wrote / napísal(a): On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote: That's not enougth. It already works for ascii characters today. Please test with both unicode and non-unicode IDE on strings with accented characters. ASCII doesn't have accented chars. If you mean non-utf local non-latin1 encoding then pre-patch doesn't work on those, anything out of ['A'-'z'] is considered block before my patch. With my patch, anything NOT listed in TSynWordBlockChars + TSynWhiteChars (if there's no highlighter), or Highlighter.WordBlockChars + TSynWhiteChars is considered a valid word-character (which MIGHT include some nonsense chars, but at least it doesn't block known word-chars, + it can be runtime adjusted (in a way to also support utf-8), unlike the current situation (where you simply cannot support utf-8 because allowed chars is a set of 8bit char). I agree on Ales here. this afternoon Ales and I spoke while he was inmplementing this. The curretn handling of Synedit now is imo plain stupid and never will work when the carset is growing bigger than 255 chars. Defining a set of nonword + whithespace chars is imo better. If someone enters a nonsence char and it is considered a part of a word is imo a lot better than acception only lower ascii. Marc _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
[lazarus] synedit patch from ales
http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1618 can someone look into why his mails don't reach the list, please? thanks henry _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On 25/01/2008, Henry Vermaak [EMAIL PROTECTED] wrote: http://www.hu.freepascal.org/fpcircbot/cgipastebin?msgid=1618 from #lazarus-ide: Almindor say it fixes word-parsing in synedit specially for accented chars etc. _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
Felipe Monteiro de Carvalho wrote / napísal(a): On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote: That's not enougth. It already works for ascii characters today. Please test with both unicode and non-unicode IDE on strings with accented characters. ASCII doesn't have accented chars. If you mean non-utf local non-latin1 encoding then pre-patch doesn't work on those, anything out of ['A'-'z'] is considered block before my patch. With my patch, anything NOT listed in TSynWordBlockChars + TSynWhiteChars (if there's no highlighter), or Highlighter.WordBlockChars + TSynWhiteChars is considered a valid word-character (which MIGHT include some nonsense chars, but at least it doesn't block known word-chars, + it can be runtime adjusted (in a way to also support utf-8), unlike the current situation (where you simply cannot support utf-8 because allowed chars is a set of 8bit char). I am also working on that and it ain't that easy, I can tell for sure. My solution isn't final, but you'd have to rewrite much more to get a full utf-8 synedit. As I said, I'm not trying to do that. You mean like that: http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868 ? No, that's no efficient. We'll need stuff on the fly eg, things which will report boundaries etc. Ales _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: synedit patch from ales
On Jan 25, 2008 6:23 PM, Ales Katona [EMAIL PROTECTED] wrote: Yes, and I'm not 100% sure of what everything that would constitute (eg: I don't think there's a valid blockchar in multibyte range), but for 99% of usages the current blockchars (+ whitechars) which are 127 seem to be working fine. That's not enougth. It already works for ascii characters today. Please test with both unicode and non-unicode IDE on strings with accented characters. I am also working on that and it ain't that easy, I can tell for sure. P.S: I think synedit will need a lot more work to be 100% utf-8 ready on all fronts. All the set of char things will have to go and we'd have to implement utf-8 utf8string[x] operations/functions (afaik fpc doesn't have them yet?) You mean like that: http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/lcl/lclproc.pas?root=lazarusr1=13868r2=13867pathrev=13868 ? -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives