Zitat von Guy Fink <[email protected]>:

Hello

I have opened issue #0018144 in the bugtracker and uploaded a new version of my codepages unit.

My description on this :

In September we had a discussion on the Lazarus-mailing list to rewrite LConvEncoding and move the functionality to the RTL (Thread: rewriting of LConvEncoding).

Since there I did a lot of coding to implement an effective algorithm, both for Singlebyte- as for Doublebyte-Codepages. A first release was on the mailing list mid-October, mainly as a base for further discussions. But there were no comments or suggestions on this.

So here is a nearly final release with many changes to the first version.

It does not compile under 2.4.2:
cp_ISO88591.pas(69,37) Error: Constant strings can't be longer than 255 chars


Major points:
- The unit supports Single- and Double-bytecodepages trough the same functions
 - Widestringsupport (configurable)
 - UTF8 and UTF16 support (UTF16 needs widestrings)

Great.

 - Direct conversion from CP to CP without intermediate string

Nice.

 - Uppercase and Lowercase support
 - Underlying Unicodes as of V 6.0.0 (October 11, 2010)
 - A converter-application to convert Unicodedefinitions to a complete
   pascal unit. The cp_* units are entirely generated by this app.
 - Conversion up to 80% faster for SBCS.

Ehm, you made many functions inline. Even those that are more than a few lines of code. This will enlarge the executables and can cost performance in normal applications (e.g. Lazarus). You call for each character a conversion function. But most real world texts contain a big part of ASCII characters, where no conversion is needed for UTF-8. My guess is that for most texts this approach is slower. But I have to wait till it compiles before I can test.


- For DBCS up to 100 times

;)


As for now there are only units for ISO-8859-1, ISO-8859-2 and CP932 (SHIFT_JIS). More to be added for the final release. The converter-subdir has all the definition files that I could find. I will add them all.

The units:
  codepages.pp : the main unit (highly configurable trough codepagesdef.inc)
unicodemappings.pas : Some definitions from unicode.org, especially the tables
                        for uppercase, lowercase and the unicodeblocks.
  utf8.pas : mainly the UTF8 functions from LCLProc + some new
  utf16.pas: same for UTF16
  acpinfo.pas: info for codepages supported by Windows, as published on MSDN

Some first test results  as attachment.


Mattias




--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to