Re: [lazarus] Non UTF-8 Encodings
On Nov 27, 2007 11:42 PM, Jesus Reyes [EMAIL PROTECTED] wrote: The other day I built a tool that takes the txt tables and convert them to pascal constant arrays attached is the resulting inc file if needed for something. If I knew something like this ucmaps/charset exists I probably wouldn't do it (It never occured to me to look into fpc TBH). Can't we just borrow from iconv like synapse synachar unit ? Razvan _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
On Tue, Nov 27, 2007 at 09:42:54AM +0200, Razvan Adrian Bogdan wrote: What about the Synapse conversion units ? There is a unit, synachar if i remember correctly that dragged a lot of charsets and codepages internally and converts wtthout iconv support for most of them and uses iconv if the encoding is not supported internally, we could write our own such unit using the idea from synapse, i think they borrowed some conversion tables from iconv. See charset unit? Or what else uses rtl/ucmaps ? _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED] wrote: See charset unit? Or what else uses rtl/ucmaps ? I think i saw it before, but didn't see any use for it, i can't seem to find any conversion tables just a strange (ugly naming, case, types, logic) skeleton to something that isn't implemented. _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
On Tue, 27 Nov 2007 12:40:35 +0200 Razvan Adrian Bogdan [EMAIL PROTECTED] wrote: On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED] wrote: See charset unit? Or what else uses rtl/ucmaps ? I think i saw it before, but didn't see any use for it, i can't seem to find any conversion tables just a strange (ugly naming, case, types, logic) skeleton to something that isn't implemented. I think too, the names should be made more readable. It is the right direction, because it is in the RTL and does not use any slow ansistrings/widestrings. So for a platform independent, pure pascal solution this could be used (using the tables of synaser). But I don't know how complete these tables are and how much work it is to maintain them. Nevertheless it should given a try. Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
--- Mattias Gaertner [EMAIL PROTECTED] escribió: On Tue, 27 Nov 2007 12:40:35 +0200 Razvan Adrian Bogdan [EMAIL PROTECTED] wrote: On Nov 27, 2007 11:04 AM, Marco van de Voort [EMAIL PROTECTED] wrote: See charset unit? Or what else uses rtl/ucmaps ? I think i saw it before, but didn't see any use for it, i can't seem to find any conversion tables just a strange (ugly naming, case, types, logic) skeleton to something that isn't implemented. I think too, the names should be made more readable. It is the right direction, because it is in the RTL and does not use any slow ansistrings/widestrings. So for a platform independent, pure pascal solution this could be used (using the tables of synaser). But I don't know how complete these tables are and how much work it is to maintain them. Nevertheless it should given a try. Mattias The other day I built a tool that takes the txt tables and convert them to pascal constant arrays attached is the resulting inc file if needed for something. If I knew something like this ucmaps/charset exists I probably wouldn't do it (It never occured to me to look into fpc TBH). Jesus Reyes A. Comparte video en la ventana de tus mensajes (y también tus fotos de Flickr). Usa el nuevo Yahoo! Messenger versión Beta. http://mx.beta.messenger.yahoo.com/ encodings.inc.gz Description: 689895700-encodings.inc.gz
Re: [lazarus] Non UTF-8 Encodings
What about the Synapse conversion units ? There is a unit, synachar if i remember correctly that dragged a lot of charsets and codepages internally and converts wtthout iconv support for most of them and uses iconv if the encoding is not supported internally, we could write our own such unit using the idea from synapse, i think they borrowed some conversion tables from iconv. Razvan _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
- When a TCodeBuffer loads a file it must find out the encoding, convert it to UTF-8 and on saving convert it back. OK, this patch may be a start. Course, lconv should be either extended or changed to something more suitable. Presenlty I am checking only the first line mycp.patch.gz Description: application/gzip
Re: [lazarus] Non UTF-8 Encodings
On Sun, 25 Nov 2007 15:03:18 +0300 Vasily I. Volchenko [EMAIL PROTECTED] wrote: - When a TCodeBuffer loads a file it must find out the encoding, convert it to UTF-8 and on saving convert it back. OK, this patch may be a start. Course, lconv should be either extended or changed to something more suitable. Presenlty I am checking only the first line Thanks. TStringlist is too slow and the conversion should not be part of codetools. The widgetsets have the best access to the system libs, so they have the best encoding converters. I can do that part. The difficult part is to find out the encoding of a text file. Your proposal of the //encoding comment has the advantage of comments. Because this is a lazarus feature I suggest to use IDE directives style comments: {%encoding xxx}. What about all the existing pascal files (e.g. coming from delphi)? Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
On Nov 25, 2007 1:42 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: Thanks. TStringlist is too slow and the conversion should not be part of codetools. The widgetsets have the best access to the system libs, so they have the best encoding converters. I can do that part. Isn't AnsiToUtf8 enougth? This way we don't depend on any external libs. A simple loop using ReadLn and WriteLn should be enougth. The difficult part is to find out the encoding of a text file. Your proposal of the //encoding comment has the advantage of comments. Because this is a lazarus feature I suggest to use IDE directives style comments: {%encoding xxx}. The best option would be using the standard way which any editor should recognize: BOM But we can't use that until the widestring manager is improved. Using a comment will cause trouble opening the file on other editors which are prepared to handle the standard. -- Felipe Monteiro de Carvalho _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
On Sun, 25 Nov 2007 15:17:54 +0100 Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote: On Nov 25, 2007 1:42 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: Thanks. TStringlist is too slow and the conversion should not be part of codetools. The widgetsets have the best access to the system libs, so they have the best encoding converters. I can do that part. Isn't AnsiToUtf8 enougth? This way we don't depend on any external libs. AnsiToUtf8 can be used to convert from system encoding to UTF-8. And there is no problem to depend on external libs if the widgetset already depend on them. A simple loop using ReadLn and WriteLn should be enougth. I guess not, but that's an implementation details. The difficult part is to find out the encoding of a text file. Your proposal of the //encoding comment has the advantage of comments. Because this is a lazarus feature I suggest to use IDE directives style comments: {%encoding xxx}. The best option would be using the standard way which any editor should recognize: BOM BOM is only for UTF. There are plenty of files in UTF-8 without BOM. See below. But we can't use that until the widestring manager is improved. Using a comment will cause trouble opening the file on other editors which are prepared to handle the standard. That's a normal problem for all text editors and text files without encoding info. If lazarus starts to add the BOM for new UTF-8 files, then FPC should not convert the string constants. BTW, what about file functions under windows? Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
[lazarus] Non UTF-8 Encodings
Goal: - IDE should load/save text files in non UTF-8 encodings. - The IDE is compiled with UTF-8 LCL, so all internal functions expects this. - It is not sufficient to change synedit, because text is often copied between synedit and other tools. And there many other IDE tools reading/editing text files. The central text loading/saving routines is the TCodeBuffer in the codetools. For performance and consistency almost all functions use these text file caches. There are a few exceptions. - When a TCodeBuffer loads a file it must find out the encoding, convert it to UTF-8 and on saving convert it back. Problems: * How to find out the encoding? - BOM - lpi file - heuristic - system char set * What about buggy text files? Vasily, your lconv.pas uses libc. AFAIK this unit only exists on a few platforms. For example not on amd64 nor sparc. Mattias _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives
Re: [lazarus] Non UTF-8 Encodings
My lconv.pas can use libc because this unit was added by someone else (in LCL), I have fixed only working swich combination. It must work without it. Anyway, lconv.pas is a bad workaround, presently it fit my purposes, but not others. _ To unsubscribe: mail [EMAIL PROTECTED] with unsubscribe as the Subject archives at http://www.lazarus.freepascal.org/mailarchives