Hello
I have opened issue #0018144 in the bugtracker and uploaded a new version of my
codepages unit.
My description on this :
In September we had a discussion on the Lazarus-mailing list to rewrite
LConvEncoding and move the functionality to the RTL (Thread: rewriting of
LConvEncoding).
Since there I did a lot of coding to implement an effective algorithm, both for
Singlebyte- as for Doublebyte-Codepages. A first release was on the mailing
list mid-October, mainly as a base for further discussions. But there were no
comments or suggestions on this.
So here is a nearly final release with many changes to the first version.
Major points:
- The unit supports Single- and Double-bytecodepages trough the same functions
- Widestringsupport (configurable)
- UTF8 and UTF16 support (UTF16 needs widestrings)
- Direct conversion from CP to CP without intermediate string
- Uppercase and Lowercase support
- Underlying Unicodes as of V 6.0.0 (October 11, 2010)
- A converter-application to convert Unicodedefinitions to a complete
pascal unit. The cp_* units are entirely generated by this app.
- Conversion up to 80% faster for SBCS. For DBCS up to 100 times
As for now there are only units for ISO-8859-1, ISO-8859-2 and CP932
(SHIFT_JIS). More to be added for the final release. The converter-subdir has
all the definition files that I could find. I will add them all.
The units:
codepages.pp : the main unit (highly configurable trough codepagesdef.inc)
unicodemappings.pas : Some definitions from unicode.org, especially the tables
for uppercase, lowercase and the unicodeblocks.
utf8.pas : mainly the UTF8 functions from LCLProc + some new
utf16.pas: same for UTF16
acpinfo.pas: info for codepages supported by Windows, as published on MSDN
Some first test results as attachment.
Best regards
______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...
Our professional Web Hosting plans include all the features you are looking for
at the best possible price.
www.globe.lu
OS Name Microsoft Windows 7 Home Premium
Version 6.1.7600 Build 7600
System Manufacturer Gigabyte Technology Co., Ltd.
System Model GA-870A-UD3
System Type x64-based PC
Processor AMD Phenom(tm) II X6 1090T Processor, 3200 Mhz, 6
Core(s), 6 Logical Processor(s)
BIOS Version/Date Award Software International, Inc. F1, 15.04.2010
Total Physical Memory 8,00 GB
Total Virtual Memory 14,0 GB
Testing character conversion.
System is little endian
Using half translation tables.
Using UTF8-translation tables.
ISO-8859-1 >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,utf8):string
100000 times, Time: 0,312 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_1ToUTF8(string):string
100000 times, Time: 0,250 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):string
100000 times, Time: 0,171 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-1):string
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating ISO_8859_1ToUTF8(string):string
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):widestring
100000 times, Time: 0,249 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-16):widestring
100000 times, Time: 0,203 [s] : Result is correct.
ISO-8859-2 >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,iso88592,utf8):string
100000 times, Time: 0,297 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_2ToUTF8(string):string
100000 times, Time: 0,250 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-8):string
100000 times, Time: 0,203 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-2):string
100000 times, Time: 0,249 [s] : Result is correct.
Evaluating ISO_8859_2ToUTF8(string):string
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-8):widestring
100000 times, Time: 0,265 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-16):widestring
100000 times, Time: 0,202 [s] : Result is correct.
ISO-8859-1 >> ISO-8859-2
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,iso88592):string
100000 times, Time: 0,874 [s]
Evaluating DirectConversion(string,chEncISO-8859-1,chEncISO-8859-2):string
100000 times, Time: 0,234 [s]
UTF-16 >> UTF-16BE
Evaluating DirectConversion(widestring,chEncUTF-16,chEncUTF-16BE):widestring
100000 times, Time: 0,234 [s] : Result is correct.
UTF-16 >> UTF-8
Evaluating DirectConversion(widestring,chEncUTF-16,chEncUTF-8):string
100000 times, Time: 0,234 [s] : Result is correct.
SHIFT_JIS >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,cp932,utf8):string
1000 times, Time: 25,147 [s]
Length(Result)=22078 Length(Reference)=22173 : 79 characters are differnt.
Evaluating LConvEncoding.CP932ToUTF8(string):string
1000 times, Time: 25,069 [s]
Length(Result)=22078 Length(Reference)=22173 : 79 characters are differnt.
Evaluating DirectConversion(string,chEncCP932,chEncUTF-8):string
1000 times, Time: 0,234 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncCP932):string
1000 times, Time: 0,203 [s] : Result is correct.
Evaluating CP932ToUTF8(string):string
1000 times, Time: 0,218 [s] : Result is correct.
UTF8 >> SHIFT_JIS
Evaluating LConvEncoding.ConvertEncoding(string,utf8,cp932):string
1000 times, Time: 23,634 [s] : 370 characters are differnt.
Evaluating LConvEncoding.UTF8ToCp932(string):string
1000 times, Time: 24,321 [s] : 370 characters are differnt.
Evaluating DirectConversion(string,chEncUTF-8,chEncCP932):string
1000 times, Time: 0,328 [s] : Result is correct.
Evaluating UTF8ToCp932(string):string
1000 times, Time: 0,343 [s] : Result is correct.
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus