[Warning: rather long. But useful, I hope.]

Adriano wrote:

> Can someone look at StringToGuid/GuidToString and confirm that they are
> broken for big-endian (the code branch when legacy == false)?

Unfortunately, I don't have access to a big-endian machine. But the code of 
those functions is flawed for both big- and little-endian machines.

GuidToString uses this format string:

  
{%02hx%02hx%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx-%02hx%02hx%02hx%02hx%02hx%02hx}

and creates the ASCII representation thusly:

  sprintf(buffer, GUID_NEW_FORMAT_UPPER,
          USHORT(guid->data[0] & 0xFF), USHORT(guid->data[0] >> 8),
          USHORT(guid->data[1] & 0xFF), USHORT(guid->data[1] >> 8),
          USHORT(guid->data[2] & 0xFF), USHORT(guid->data[2] >> 8),
          USHORT(guid->data[3] & 0xFF), USHORT(guid->data[3] >> 8),
          USHORT(guid->data[4] & 0xFF), USHORT(guid->data[4] >> 8),
          USHORT(guid->data[5] & 0xFF), USHORT(guid->data[5] >> 8),
          USHORT(guid->data[6] & 0xFF), USHORT(guid->data[6] >> 8),
          USHORT(guid->data[7] & 0xFF), USHORT(guid->data[7] >> 8));

It reads the guid structure as a sequence of 8 words (which it isn't), and then 
writes out each 'word' LSB first. This is wrong: network order is MSB first.

On big-endian machines, this means that every 2 adjacent bytes are swapped.

On little-endian machines, funny enough, the second half of the string turns 
out right. It consists of 1-byte fields. They are read as two-byte words, but 
because little-endian words are LSB-first, the sprintf puts them back in the 
right order.
2-byte words and 4-byte dwords are printed with their bytes reversed on 
little-endian machines.

(You may notice that the above claim is not consistent with the observed 
behaviour of UUID_TO_CHAR. I'll get to that a little later.)

To get GuidToString right, the format string should be:

  {%08lX-%04hX-%04hX-%02hX%02hX-%02hX%02hX%02hX%02hX%02hX%02hX}

and the command:

  sprintf(buffer, GUID_RIGHT_FORMAT_UPPER,
          guid->data1, guid->data2, guid->data3,
          guid->data4[0], guid->data4[1], guid->data4[2], guid->data4[3],
          guid->data4[4], guid->data4[5], guid->data4[6], guid->data4[7]);

This works on both big- and little-endian machines. No need to do any 
byte-position juggling ourselves: the compiler knows the byte order.

What goes for GuidToString also goes for StringToGuid. They are each other's 
complement (or rather: inverse function).


Now to evlUuidToChar. This function reads the 16-char OCTETS string and 
produces the 36-char ASCII string.

At a certain point, it creates a GUID record like this:

  const FB_GUID* guid = reinterpret_cast<const FB_GUID*>(data);

That is not the right way, because the data are in network order (at least they 
should be - this is the 16-char string).
So guid will be wrong on little-endian machines and right on big-endian 
machines.
More precisely, on little-endian machines:
- data1, data2 and data3 all have their bytes reversed (i.e. not in 
little-endian host order);
- data4 is OK, because this is an array of *bytes*.

Then, for UUID_TO_CHAR:

  case funUuidBroken:
    GuidToString(buffer, guid, false);
    break;

So here, the network-order byte string is fed to GuidToString, which reads it 
as series of host-order words and then swaps each word's bytes before 
outputting them to the 36-char ASCII string.

The effect on little-endian machines is that all the multi-byte (d)words, which 
are reversed in guid, are put right again by the flawed GuidToString. And the 
bytes of the array, which are already in the right place, are also output 
correctly by GuidToString (as shown earlier).

On my little-endian machine:

  select UUID_TO_CHAR(x'11223344556677889900AABBCCDDEEFF') from rdb$database
  -> 11223344-5566-7788-9900-AABBCCDDEEFF

However, on big-endian machines, GuidToString swaps every pair of bytes in the 
guid struct (which already has the correct order for those machines), so the 
output will be wrong there.

For UUID_TO_CHAR2:

  case funUuidRfc:
    sprintf(buffer, GUID_NEW_FORMAT_LOWER,
    USHORT((guid->data1 >> 24) & 0xFF), USHORT((guid->data1 >> 16) & 0xFF),
    USHORT((guid->data1 >> 8) & 0xFF), USHORT(guid->data1 & 0xFF),
    USHORT((guid->data2 >> 8) & 0xFF), USHORT(guid->data2 & 0xFF),
    USHORT((guid->data3 >> 8) & 0xFF), USHORT(guid->data3 & 0xFF),
    USHORT(guid->data4[0]), USHORT(guid->data4[1]),
    USHORT(guid->data4[2]), USHORT(guid->data4[3]),
    USHORT(guid->data4[4]), USHORT(guid->data4[5]),
    USHORT(guid->data4[6]), USHORT(guid->data4[7]));
    break;

Because data1, data2 and data3 are in network order but the code expects them 
in host order, on little-endian machines the bytes in those fields will be 
reversed in the output string. The bytes in the array are fine.

Indeed:

   select UUID_TO_CHAR2(x'11223344556677889900AABBCCDDEEFF') from rdb$database
   -> 44332211-6655-8877-9900-aabbccddeeff

Mind you: because of the swap in data3, combined with the flaw in GEN_UUID, the 
output of UUID_TO_CHAR2 *looks* fine on little-endian machines, because the 4 
(version number) appears in the right position.

On big-endian machines, UUID_TO_CHAR2 should work fine, because big-endian host 
order is the same as network order (and also "natural" order, the way we write 
our binary, octal, decimal and hexadecimal numbers).


BTW, I didn't look at evlCharToUuid, but I guess similar things are happening 
there, because on little-endians CHAR_TO_UUID works fine despite the flaws in 
StringToGuid, and CHAR_TO_UUID2 doesn't.


So, all in all, our five UUID functions are all flawed on at least one type of 
platform. But it's not hard to get them right, with code that is even simpler 
than what we have now (and without the need for the '2' functions).


Cheers,
Paul Vinkenoog

(why do I always do these things at night?)

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to