On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote: > Would you find it useful to have an ECL that only supports character codes 0 > - 65535? That would make it probably easier to embed the part of the Unicode > database associated to it (< 65535 bytes) and have a standalone executable. > Executables would also be a bit faster and use less memory (16-bits vs > 32-bits per character)
... Not sure I follow. For many people, that would be fine; but its a subset of unicode and could cause confusion when it breaks. Lately I've heard several fairly knowledgeable people say UTF-8 really is ideal. While UTF-32 allows immediate indexing to a given codepoint, that doesn't help with common tasks due to combining marks and such. They appear to be supported by (or have subverted) wikipedia. http://en.wikipedia.org/wiki/Utf-32 http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues As for the database, you can always split it into separately loadable chunks and throw an error if a chunk is not available when needed. - Daniel ------------------------------------------------------------------------------ Index, Search & Analyze Logs and other IT data in Real-Time with Splunk Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. Free Software Download: http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list