On Tue, Feb 10, 2009 at 12:59 PM, Jim Meyering <[email protected]> wrote: > Nick Demou <[email protected]> wrote: >> [...] >> Thanks for the info Eric. I was almost sure this would be the case. In >> fact I don't consider this as the main topic of my bug report. The >> main topic for me is the documentation. The man and info page don't >> make it clear that utf-8 is not supported. I believe that others after >> me will spend a lot of time just to realize that "it's just a missing >> feature". Do you have any thoughts regarding my suggestions on the >> documentation? > > The "real" documentation is in coreutils.texi (generated to > coreutils.info and available via "info coreutils"). There, > under "tr invocation", it already has this caveat:
oops, mea culpa I did read carefully the man page and then I did search coreutils info before submitting this bug report. However I only searched for "utf" and "unicode" so I missed the warning which doesn't contain any of the two strings > and since "man tr" does point to the authoritative source [the info pages]: > [...] > that may be enough. I think it is for English speaking users but not for non-English speaking ones who have to deal with actual[1] UTF8 text often. I would suggest the following small corrections: A. for the info page ==================== add a direct reference to UTF-8 and Unicode like this: from: # Currently `tr' fully supports only single-byte characters. # Eventually it will support multibyte characters; to: # Currently `tr' fully supports only single-byte characters. # Eventually it will support multibyte characters (e.g. UTF-8 # or UTF-16 encoded Unicode characters); B. for the man page =================== add a reference like this: # Currently `tr' fully supports only single-byte characters. # (a notable example of multibyte characters that are not # supported are UTF-8 and UTF-16 encoded Unicode characters) C. for the core utils FAQ ========================= add a Question like this one: # Q: What's the status of Unicode support. (for which I cannot suggest a thorough answer although I could try and dig something out of the current documentation if noone else is able to help at the moment) or # Q: I get funny/no/wrong results when dealing with # UTF-8/Unicode input # A: UTF-8 and UTF-16 encodings for Unicode text is made up # of multibyte characters which are not well supported # by some coreutils programs. ___________________ [1] UTF-8 above the ASCII char set -- "The software is licensed, not sold" -- MICROSOFT LICENSE TERMS _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
