Re: Why UTF-8/16 character encodings?

Walter Bright Sat, 25 May 2013 15:05:26 -0700

On 5/25/2013 2:51 PM, Walter Bright wrote:

On 5/25/2013 12:51 PM, Joakim wrote:

For a multi-language string encoding, the header would
contain a single byte for every language used in the string, along with multiple
index bytes to signify the start and finish of every run of single-language
characters in the string. So, a list of languages and a list of pure
single-language substrings.


Please implement the simple C function strstr() with this simple scheme, and
post it here.

http://www.digitalmars.com/rtl/string.html#strstr

I'll go first. Here's a simple UTF-8 version in C. It's not the fastest way todo it, but at least it is correct:

----------------------------------
char *strstr(const char *s1,const char *s2) {
    size_t len1 = strlen(s1);
    size_t len2 = strlen(s2);
    if (!len2)
        return (char *) s1;
    char c2 = *s2;
    while (len2 <= len1) {
        if (c2 == *s1)
            if (memcmp(s2,s1,len2) == 0)
                return (char *) s1;
        s1++;
        len1--;
    }
    return NULL;
}

Re: Why UTF-8/16 character encodings?

Reply via email to