On 06/13/2012 02:59 PM, Igor Stasenko wrote: > On 13 June 2012 10:31, Philippe Marschall > <philippe.marsch...@netcetera.ch> wrote: >> On 06/13/2012 04:44 AM, Igor Stasenko wrote: >>> Hi, hardcore hackers. >>> please take a look at the code and tell if it can be improved. >>> >>> The AsmJit snippet below transforms an unicode integer value >>> to 1..4-byte sequence of utf-8 >>> >>> then the outer piece of code (which is not yet written) will >>> accumulate the results of this snippet >>> to do a memory-aligned (4byte) writes.. >>> like that, if 4 unicode characters can be encoded into 4 utf-8 bytes >>> (which mostly the case for latin-1 char range), then there will be >>> 4 memory reads (to read four 32-bit unicode values) but only single >>> memory write (to write four 8-bit utf-8 encoded values). >>> >>> The idea is to make utf-8 encoding speed close to memory copying speed :) >> >> In Seaside we use an other trick that Andreas Raab come up with. The >> assumption is that most of the strings are ASCII [1]. We use a CharSet / >> bitmap to quickly scan the string for the index of the first non-ASCII >> character. If we find none we just answer the argument. No copying at all. >> > > Well, in my case i will need copying because i need to null-terminate it, > to represent it as null-terminated string. > This is what cairo library expects as input for rendering text. > And this also means that i can use a single buffer for conversions to > avoid generating garbage, i.e. > i take input string, convert it to utf8 in private buffer, then pass > that buffer as input to external call, > on next call an input can be any other string, but output will be the > same private buffer. > I will be needing to allocate new buffer if incoming string does not > fits into it.
I see, different use case. Cheers Philippe