Re: [Ecls-list] Unicode 16-bits

Daniel Herring Mon, 21 Feb 2011 19:34:24 -0800

On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote:

> Would you find it useful to have an ECL that only supports character codes 0 
> - 65535? That would make it probably easier to embed the part of the Unicode 
> database associated to it (< 65535 bytes) and have a standalone executable.
> Executables would also be a bit faster and use less memory (16-bits vs 
> 32-bits per character)


...

Not sure I follow.  For many people, that would be fine; but its a subset 
of unicode and could cause confusion when it breaks.

Lately I've heard several fairly knowledgeable people say UTF-8 really is 
ideal.  While UTF-32 allows immediate indexing to a given codepoint, that 
doesn't help with common tasks due to combining marks and such.

They appear to be supported by (or have subverted) wikipedia.
http://en.wikipedia.org/wiki/Utf-32
http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues

As for the database, you can always split it into separately loadable 
chunks and throw an error if a chunk is not available when needed.

- Daniel

------------------------------------------------------------------------------
Index, Search & Analyze Logs and other IT data in Real-Time with Splunk 
Collect, index and harness all the fast moving IT data generated by your 
applications, servers and devices whether physical, virtual or in the cloud.
Deliver compliance at lower cost and gain new business insights. 
Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list

Re: [Ecls-list] Unicode 16-bits

Reply via email to