Re: [Lazarus] rewriting of LConvEncoding

Guy Fink Thu, 23 Sep 2010 02:12:00 -0700

--- Ursprüngliche Nachricht ---
Von: Marco van de Voort <[email protected]>
An: [email protected], Lazarus mailing list 
<[email protected]>
Betreff: Re: [Lazarus] rewriting of LConvEncoding

> On Mon, Sep 20, 2010 at 11:16:39PM +0200, [email protected] wrote:
> > > Note that FPC does contain a some conversion code in the RTL,
> > > see the ucmaps directory and the charset unit.
> >
> > The files in ucmaps are the files from ftp.unicode.org that I
> mentionned.
> >
> > My opinion is that static tables are smaller and faster than
> dynamic
> > mappings (especially for the asiancodepages), but memory could be
> an
> > issue.  Perhaps this can be solved by smartlinking.
>
> Explain static/dynamic in this context. Note that there is a tool to
> change
> the conversions into tables, so they can be linked in, already.

With the tool you mean the charset-unit? What I see is, that charset is not 
finished. It just offers a rudimentary way to read in the unicode.org 
textfiles, and some functions to find a mapping and convert one character. No 
support for complete string conversions, or UTF-8, UTF-16, UTF-32.

The tables are created dynamically via getmem and stored in a linked list. 
Every character is stored in a record : tunicodecharmapping, where Unicode is 
only definded as word, not cardinal. Thus UTF32 is not supported, UTF16 
surrogates neither. The list is lineary scanned to find a certain mapping, not 
really optimal for fast conversions. The backconversion unicode > character is 
even worse than the algortihm in LConvEncoding, it searches linear through the 
mapping, this may take some time on big asian codepages. (LConvEncoding uses a 
dual Quicksearch)

Charset has absolutly no support to handle endianess of UTF-16 and UTF-32 
strings.

With static tables, I mean a table in a const-section, compiled and linked into 
the code. This offers optimization possibilities to the compiler regarding 
indexed access to the table content. The tables can be optimized for best 
fitting datasize, latincodepages can mostly be encoded as UCS2. Others, 
especially asiancodepages but also some MAC-codepages need UCS4. The 
backconversion can be optimized trough individual functions for every codepage.

It is easy to write an utility which creates the sourcecode for such this 
codepage-units.

Best regards

______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for 
at the best possible price.
www.globe.lu

______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for 
at the best possible price.
www.globe.lu

--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] rewriting of LConvEncoding

Reply via email to