[Warning: rather long. But useful, I hope.] Adriano wrote:
> Can someone look at StringToGuid/GuidToString and confirm that they are > broken for big-endian (the code branch when legacy == false)? Unfortunately, I don't have access to a big-endian machine. But the code of those functions is flawed for both big- and little-endian machines. GuidToString uses this format string: {%02hx%02hx%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx%02hx%02hx%02hx%02hx} and creates the ASCII representation thusly: sprintf(buffer, GUID_NEW_FORMAT_UPPER, USHORT(guid->data[0] & 0xFF), USHORT(guid->data[0] >> 8), USHORT(guid->data[1] & 0xFF), USHORT(guid->data[1] >> 8), USHORT(guid->data[2] & 0xFF), USHORT(guid->data[2] >> 8), USHORT(guid->data[3] & 0xFF), USHORT(guid->data[3] >> 8), USHORT(guid->data[4] & 0xFF), USHORT(guid->data[4] >> 8), USHORT(guid->data[5] & 0xFF), USHORT(guid->data[5] >> 8), USHORT(guid->data[6] & 0xFF), USHORT(guid->data[6] >> 8), USHORT(guid->data[7] & 0xFF), USHORT(guid->data[7] >> 8)); It reads the guid structure as a sequence of 8 words (which it isn't), and then writes out each 'word' LSB first. This is wrong: network order is MSB first. On big-endian machines, this means that every 2 adjacent bytes are swapped. On little-endian machines, funny enough, the second half of the string turns out right. It consists of 1-byte fields. They are read as two-byte words, but because little-endian words are LSB-first, the sprintf puts them back in the right order. 2-byte words and 4-byte dwords are printed with their bytes reversed on little-endian machines. (You may notice that the above claim is not consistent with the observed behaviour of UUID_TO_CHAR. I'll get to that a little later.) To get GuidToString right, the format string should be: {%08lX-%04hX-%04hX-%02hX%02hX-%02hX%02hX%02hX%02hX%02hX%02hX} and the command: sprintf(buffer, GUID_RIGHT_FORMAT_UPPER, guid->data1, guid->data2, guid->data3, guid->data4[0], guid->data4[1], guid->data4[2], guid->data4[3], guid->data4[4], guid->data4[5], guid->data4[6], guid->data4[7]); This works on both big- and little-endian machines. No need to do any byte-position juggling ourselves: the compiler knows the byte order. What goes for GuidToString also goes for StringToGuid. They are each other's complement (or rather: inverse function). Now to evlUuidToChar. This function reads the 16-char OCTETS string and produces the 36-char ASCII string. At a certain point, it creates a GUID record like this: const FB_GUID* guid = reinterpret_cast<const FB_GUID*>(data); That is not the right way, because the data are in network order (at least they should be - this is the 16-char string). So guid will be wrong on little-endian machines and right on big-endian machines. More precisely, on little-endian machines: - data1, data2 and data3 all have their bytes reversed (i.e. not in little-endian host order); - data4 is OK, because this is an array of *bytes*. Then, for UUID_TO_CHAR: case funUuidBroken: GuidToString(buffer, guid, false); break; So here, the network-order byte string is fed to GuidToString, which reads it as series of host-order words and then swaps each word's bytes before outputting them to the 36-char ASCII string. The effect on little-endian machines is that all the multi-byte (d)words, which are reversed in guid, are put right again by the flawed GuidToString. And the bytes of the array, which are already in the right place, are also output correctly by GuidToString (as shown earlier). On my little-endian machine: select UUID_TO_CHAR(x'11223344556677889900AABBCCDDEEFF') from rdb$database -> 11223344-5566-7788-9900-AABBCCDDEEFF However, on big-endian machines, GuidToString swaps every pair of bytes in the guid struct (which already has the correct order for those machines), so the output will be wrong there. For UUID_TO_CHAR2: case funUuidRfc: sprintf(buffer, GUID_NEW_FORMAT_LOWER, USHORT((guid->data1 >> 24) & 0xFF), USHORT((guid->data1 >> 16) & 0xFF), USHORT((guid->data1 >> 8) & 0xFF), USHORT(guid->data1 & 0xFF), USHORT((guid->data2 >> 8) & 0xFF), USHORT(guid->data2 & 0xFF), USHORT((guid->data3 >> 8) & 0xFF), USHORT(guid->data3 & 0xFF), USHORT(guid->data4[0]), USHORT(guid->data4[1]), USHORT(guid->data4[2]), USHORT(guid->data4[3]), USHORT(guid->data4[4]), USHORT(guid->data4[5]), USHORT(guid->data4[6]), USHORT(guid->data4[7])); break; Because data1, data2 and data3 are in network order but the code expects them in host order, on little-endian machines the bytes in those fields will be reversed in the output string. The bytes in the array are fine. Indeed: select UUID_TO_CHAR2(x'11223344556677889900AABBCCDDEEFF') from rdb$database -> 44332211-6655-8877-9900-aabbccddeeff Mind you: because of the swap in data3, combined with the flaw in GEN_UUID, the output of UUID_TO_CHAR2 *looks* fine on little-endian machines, because the 4 (version number) appears in the right position. On big-endian machines, UUID_TO_CHAR2 should work fine, because big-endian host order is the same as network order (and also "natural" order, the way we write our binary, octal, decimal and hexadecimal numbers). BTW, I didn't look at evlCharToUuid, but I guess similar things are happening there, because on little-endians CHAR_TO_UUID works fine despite the flaws in StringToGuid, and CHAR_TO_UUID2 doesn't. So, all in all, our five UUID functions are all flawed on at least one type of platform. But it's not hard to get them right, with code that is even simpler than what we have now (and without the need for the '2' functions). Cheers, Paul Vinkenoog (why do I always do these things at night?) ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel