Hi,

Pete Batard wrote:
>   strncpy(cpy_result, p_psz_out, i_inlen);

Known as nitpicker i want to to point out that this would avoid a memory
corruption in case of overflow but would also truncate the name,
potentially to an incomplete UTF-8 byte sequence at the end.

I add the technical part of my private mail to Rocky of yesterday with
assessment and proposal which is in part combinable with Pete's.

------------------------------------------------------------------
My assessment for now:

The  strcpy() gesture in function _iso9660_recname_to_cstring() is part
of the conversion from UCS-2 to UTF-8.
I don't see the need for strncpy(), because the result of
cdio_charset_to_utf8() seems to be 0-terminated. (At least it does not
reply a length which would be needed if no terminating 0 is appended.)

But the memory safety depends on the caller having allocated cpy_result
with a sufficient size.
And there i see a potential problem with some languages.

The allocation of cpy_result happens in _iso9660_dir_to_statbuf(),
line 858 ff.:

  if (!dir_len) return NULL;

  i_fname = from_711(p_iso9660_dir->filename.len);

  /* .. string in statbuf is one longer than in p_iso9660_dir's listing '\1' */
  stat_len = sizeof(iso9660_stat_t) + i_fname + 2;

  /* Reuse multiextent p_stat if not NULL */
  if (!p_stat) {
    p_stat = calloc(1, stat_len);
    first_extent = true;

The size of p_stat->filename is then (i_fname + 2), i.e.  what is needed
for the UCS-2 representation of the name plus trailing 16-bit 0.
p_stat->filename becomes parameter cpy_result:

Line 956:
        if (!_iso9660_recname_to_cstring(&p_iso9660_dir->filename.str[1],
                                         i_inlen, NULL, p_stat->filename,
                                         u_joliet_level))
Line 965:
    if (!_iso9660_recname_to_cstring(&p_iso9660_dir->filename.str[1],
                                     i_inlen, NULL, p_stat->filename,
                                     u_joliet_level))

Normally text shrinks during the conversion from UCS-2 to UTF-8. But
there are UCS-2/UTF-16 characters which need more than 2 bytes. E.g.:
  https://www.compart.com/en/unicode/U+0800
  Samaritan Letter Alaf
  UTF-8 Encoding:       0xE0 0xA0 0x80
  UTF-16 Encoding:      0x0800

So if the UCS-2 name contains more than one of these UTF-8 3-byters the
surplus space for the 16-bit 0 can be used up and the cpy_result can
overflow, regardless of strcpy() versus strncpy().
(Each 1-byte UTF-8 character in the conversion result adds another byte
 of room for 3-byters. But i doubt that a string with samaritan letters
 will contain many letters from 7-bit US-ASCII.)

It looks like UTF-8 4-byters are not possible in UCS-2.
If we encounter UTF-16 instead, the Joliet characters which yield 4-byte
UTF-8 bytes are 4-byters themselves. (Conversion might yield poor results
if really UCS-2 is expected. But memory should be safe.)

--------------------------------------------------------------------------
Proposal in addition to Pete's:

In case of HAVE_JOLIET and u_joliet_level allocate in
   _iso9660_dir_to_statbuf()
3/2 of the UCS-2 name length (i_fname of a good UCS-2 string is divisibe
by 2):

  stat_len = sizeof(iso9660_stat_t) + 3 * i_fname / 2 + 2;

Rather an alternative to Pete's proposal:

Further i propose to add a parameter
  size_t cpy_result_size
to
  _iso9660_recname_to_cstring()
and to feed it with (3 * i_fname / 2 + 2)  from _iso9660_dir_to_statbuf().
In case of overflow, a message about programming error should be emitted.


Have a nice day :)

Thomas


Reply via email to