Re: Coreutils instruction changes

Alexander E. Patrakov Thu, 24 Aug 2006 21:18:42 -0700

Matthew Burgess wrote:

2) The i18n patch isn't going to be accepted in its current state,which I already suspected. It's incomplete and makes the code harderto maintain. I'm currently waiting on feedback on how to proceed fromhere.

Either disagree with the maintainers (because it is simply necessary foran acceptable level of UTF-8 support, not only in coreutils), or dropUTF-8 support completely from LFS and BLFS (because it is not ready in"unpatched upstream" and will never be, and because no other LFS projectimplemented it, and LFS is intended to be a "minimal base"). UTF-8 is isa nightmare to maintain, creates a rather big but unwritten blacklist ofpackages, and it makes experimentation with other areas on the LiveCDdifficult for other maintainers (they are afraid to break the thing).

Also, Microsoft's approach to Unicode (keep a 8-bit encoding for legacyapplications and support UCS, not UTF representation of Unicode) istechnically superior and that's what is implemented when one usesQt-based GUI applications in non-UTF-8 locales. Too bad that it isimpossible to implement proper in-kernel NFSv4 support with this approach.

If you drop only the coreutils patch (as opposed to all UTF-8 support),add the following note to the book:

{{{

Many other distributions apply the so-called i18n patch to coreutils. Itoriginates from the OpenI18N group and is currently maintained byRedHat. The patch makes changes necessary for "cut", "pr", "uniq","expand", "fold", "join", "unexpand" and "sort" to process multibytecharacters correctly. Without the patch, the following issues occur:

1) "cut" has no way to take n characters (as opposed to n bytes), andcan damage the last character by cutting in the middle of it.2) "fold" uses number of bytes, not number of character cells, to decidewhere to fold the string. The result is premature folding or breakingthe string in the middle of a multibyte character (a no-no).3) Utilities that take a separator character as a command-line parametercannot be told to use a multibyte character as a separator.

4) The OpenI18N testsuite (required for LSB certification) doesn't pass.

However, the patch has been rejected by upstream maintainers ofcoreutils, because it's incomplete (e.g., the "tr :upper: :lower:"command doesn't work correctly in multibyte locales even with the patch)and makes the code harder to maintain. Thus, if you have to processnon-ASCII text in UTF-8 locales, you have to do it with other utilities,such as Perl.

}}}

Also note that the patch exists for 5 years (!!!) and is still not inthe acceptable shape. Looks like parties (like RedHat and LSB) that areinterested in the results that the patch gives are perfectly OK with thedeviation.

3) The suppress-uptime-kill-su patch is obviously Linux specific, soisn't suitable for upstream.

s/Linux/LFS/

4) We currently use a sed to avoid a supposed buffer overflow intranslated versions of `who'. This is unnecessary now as it's beenfixed in a different manner, so the sed can be removed from the book.

Well, that's partially correct, see the existing code from coreutils-5.96:

 if (include_idle && !short_output && strlen (idle) < sizeof x_idle - 1)
   sprintf (x_idle, " %-6s", idle);
 else
   *x_idle = '\0';

This means that, if the string doesn't fit, it will be deleted from theheader completely (thus, you are right that there is no overflow). Butthat's still not perfect, because of misaligning of headers with theircolumns. The sed substitution that disables i18n for the "who" programmakes the output better:


sed -i '/config.h/a#undef ENABLE_NLS' src/who.c

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Re: Coreutils instruction changes

Reply via email to