Hello

I have opened issue #0018144 in the bugtracker and uploaded a new version of my 
codepages unit.

My description on this :

In September we had a discussion on the Lazarus-mailing list to rewrite 
LConvEncoding and move the functionality to the RTL (Thread: rewriting of 
LConvEncoding).

Since there I did a lot of coding to implement an effective algorithm, both for 
Singlebyte- as for Doublebyte-Codepages. A first release was on the mailing 
list mid-October, mainly as a base for further discussions. But there were no 
comments or suggestions on this.

So here is a nearly final release with many changes to the first version.

Major points:
 - The unit supports Single- and Double-bytecodepages trough the same functions
 - Widestringsupport (configurable)
 - UTF8 and UTF16 support (UTF16 needs widestrings)
 - Direct conversion from CP to CP without intermediate string
 - Uppercase and Lowercase support
 - Underlying Unicodes as of V 6.0.0 (October 11, 2010)
 - A converter-application to convert Unicodedefinitions to a complete
   pascal unit. The cp_* units are entirely generated by this app.
 - Conversion up to 80% faster for SBCS. For DBCS up to 100 times



As for now there are only units for ISO-8859-1, ISO-8859-2 and CP932 
(SHIFT_JIS). More to be added for the final release. The converter-subdir has 
all the definition files that I could find. I will add them all.

The units:
  codepages.pp : the main unit (highly configurable trough codepagesdef.inc)
  unicodemappings.pas : Some definitions from unicode.org, especially the tables
                        for uppercase, lowercase and the unicodeblocks.
  utf8.pas : mainly the UTF8 functions from LCLProc + some new
  utf16.pas: same for UTF16
  acpinfo.pas: info for codepages supported by Windows, as published on MSDN

Some first test results  as attachment.


Best regards



______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for 
at the best possible price.
www.globe.lu
OS Name Microsoft Windows 7 Home Premium
Version 6.1.7600 Build 7600
System Manufacturer     Gigabyte Technology Co., Ltd.
System Model            GA-870A-UD3
System Type             x64-based PC
Processor               AMD Phenom(tm) II X6 1090T Processor, 3200 Mhz, 6 
Core(s), 6 Logical Processor(s)
BIOS Version/Date       Award Software International, Inc. F1, 15.04.2010
Total Physical Memory   8,00 GB
Total Virtual Memory    14,0 GB

Testing character conversion.
System is little endian
Using half translation tables.
Using UTF8-translation tables.
ISO-8859-1 >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,utf8):string        
100000 times, Time: 0,312 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_1ToUTF8(string):string                     
100000 times, Time: 0,250 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):string        
100000 times, Time: 0,171 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-1):string                      
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating ISO_8859_1ToUTF8(string):string                                   
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):widestring    
100000 times, Time: 0,249 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-16):widestring   
100000 times, Time: 0,203 [s] : Result is correct.

ISO-8859-2 >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,iso88592,utf8):string        
100000 times, Time: 0,297 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_2ToUTF8(string):string                     
100000 times, Time: 0,250 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-8):string        
100000 times, Time: 0,203 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-2):string                      
100000 times, Time: 0,249 [s] : Result is correct.
Evaluating ISO_8859_2ToUTF8(string):string                                   
100000 times, Time: 0,234 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-8):widestring    
100000 times, Time: 0,265 [s] : Result is correct.
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-16):widestring   
100000 times, Time: 0,202 [s] : Result is correct.

ISO-8859-1 >> ISO-8859-2
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,iso88592):string    
100000 times, Time: 0,874 [s]
Evaluating DirectConversion(string,chEncISO-8859-1,chEncISO-8859-2):string   
100000 times, Time: 0,234 [s]

UTF-16 >> UTF-16BE
Evaluating DirectConversion(widestring,chEncUTF-16,chEncUTF-16BE):widestring 
100000 times, Time: 0,234 [s] : Result is correct.

UTF-16 >> UTF-8
Evaluating DirectConversion(widestring,chEncUTF-16,chEncUTF-8):string        
100000 times, Time: 0,234 [s] : Result is correct.

SHIFT_JIS >> UTF-8
Evaluating LConvEncoding.ConvertEncoding(string,cp932,utf8):string           
1000 times, Time: 25,147 [s]
  Length(Result)=22078 Length(Reference)=22173 : 79 characters are differnt.
Evaluating LConvEncoding.CP932ToUTF8(string):string                          
1000 times, Time: 25,069 [s]
  Length(Result)=22078 Length(Reference)=22173 : 79 characters are differnt.
Evaluating DirectConversion(string,chEncCP932,chEncUTF-8):string             
1000 times, Time: 0,234 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncCP932):string                           
1000 times, Time: 0,203 [s] : Result is correct.
Evaluating CP932ToUTF8(string):string                                        
1000 times, Time: 0,218 [s] : Result is correct.

UTF8 >> SHIFT_JIS
Evaluating LConvEncoding.ConvertEncoding(string,utf8,cp932):string           
1000 times, Time: 23,634 [s] : 370 characters are differnt.
Evaluating LConvEncoding.UTF8ToCp932(string):string                          
1000 times, Time: 24,321 [s] : 370 characters are differnt.
Evaluating DirectConversion(string,chEncUTF-8,chEncCP932):string             
1000 times, Time: 0,328 [s] : Result is correct.
Evaluating UTF8ToCp932(string):string                                        
1000 times, Time: 0,343 [s] : Result is correct.
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to