Did you try auto converters (with manual rework at end)? 

e.g. http://rvelthuis.de/programs/convertpack.html
http://rvelthuis.de/articles/articles-convert.html

Hope this helps, 
PSlllllva - CSW

-----Original Message-----
From: Delphi [mailto:[email protected]] On Behalf Of Ross Levis
Sent: 7 de julho de 2016 06:09
To: 'Moderated List for the Discussion of Delphi Programming excluding 
Database-related topics' <[email protected]>
Subject: C++ function convert to Delphi

I'm hoping someone with C++ knowledge and some spare time can convert this 
function to Delphi/Pascal for me.  I don't have any C++ knowledge.

 

It is a method to establish with "some" degree of certainty if text is UTF-8 
encoded as opposed to a European character set with extended characters.

 

Much appreciated!

 

int isUTF8(const char *data, size_t size)

{

    const unsigned char *str = (unsigned char*)data;

    const unsigned char *end = str + size;

    unsigned char byte;

    unsigned int code_length, i;

    uint32_t ch;

    while (str != end) {

        byte = *str;

        if (byte <= 0x7F) {

            /* 1 byte sequence: U+0000..U+007F */

            str += 1;

            continue;

        }

 

        if (0xC2 <= byte && byte <= 0xDF)

            /* 0b110xxxxx: 2 bytes sequence */

            code_length = 2;

        else if (0xE0 <= byte && byte <= 0xEF)

            /* 0b1110xxxx: 3 bytes sequence */

            code_length = 3;

        else if (0xF0 <= byte && byte <= 0xF4)

            /* 0b11110xxx: 4 bytes sequence */

            code_length = 4;

        else {

            /* invalid first byte of a multibyte character */

            return 0;

        }

 

        if (str + (code_length - 1) >= end) {

            /* truncated string or invalid byte sequence */

            return 0;

        }

 

        /* Check continuation bytes: bit 7 should be set, bit 6 should be

         * unset (b10xxxxxx). */

        for (i=1; i < code_length; i++) {

            if ((str[i] & 0xC0) != 0x80)

                return 0;

        }

 

        if (code_length == 2) {

            /* 2 bytes sequence: U+0080..U+07FF */

            ch = ((str[0] & 0x1f) << 6) + (str[1] & 0x3f);

            /* str[0] >= 0xC2, so ch >= 0x0080.

               str[0] <= 0xDF, (str[1] & 0x3f) <= 0x3f, so ch <= 0x07ff */

        } else if (code_length == 3) {

            /* 3 bytes sequence: U+0800..U+FFFF */

            ch = ((str[0] & 0x0f) << 12) + ((str[1] & 0x3f) << 6) +

                  (str[2] & 0x3f);

            /* (0xff & 0x0f) << 12 | (0xff & 0x3f) << 6 | (0xff & 0x3f) = 
0xffff,

               so ch <= 0xffff */

            if (ch < 0x0800)

                return 0;

 

            /* surrogates (U+D800-U+DFFF) are invalid in UTF-8:

               test if (0xD800 <= ch && ch <= 0xDFFF) */

            if ((ch >> 11) == 0x1b)

                return 0;

        } else if (code_length == 4) {

            /* 4 bytes sequence: U+10000..U+10FFFF */

            ch = ((str[0] & 0x07) << 18) + ((str[1] & 0x3f) << 12) +

                 ((str[2] & 0x3f) << 6) + (str[3] & 0x3f);

            if ((ch < 0x10000) || (0x10FFFF < ch))

                return 0;

        }

        str += code_length;

    }

    return 1;

}

 

 

Regards,

Ross.

_______________________________________________
Delphi mailing list
[email protected]
http://lists.elists.org/cgi-bin/mailman/listinfo/delphi
_______________________________________________
Delphi mailing list
[email protected]
http://lists.elists.org/cgi-bin/mailman/listinfo/delphi

Reply via email to