On 25/04/15 03:38, Assaf Gordon wrote:
> Hello,
> 
> Would you be willing to add the following patch, mentioning tab-expansion and 
> multibyte counting of '-L'
> in the "--help" screen, and the manual?
> Currently this is mentioned only in one sentence at the end of a long 
> paragraph, and is easily missed.
> My wording could be improved, but I hope this will help prevent confusion 
> with 'wc -L' output.

Wow that is confusing/ambiguous.
I'll apply the attached in your name.

> 
> Somewhat related:
> I seem to get unexpected result with '-L' when forcing C locale.
> Perhaps I'm doing something wrong, or there's more intricate details of '-L' ?
> 
> # This is a Unicode Character 'BLACK HEART SUIT' (U+2665)
> $ printf "\xe2\x99\xa5\n"
> 
> # counting characters with UTF-8 locale is 1,
> # Counting bytes is 3,
> # longest line is 1 - as expected:
> $ printf "\xe2\x99\xa5" | LC_ALL=en_US.UTF-8 wc -cmL
>        1       3       1
> 
> 
> # using C locale, characters=bytes=3,
> # but longest line is 0 ?
> $ printf "\xe2\x99\xa5" | LC_ALL=C wc -cmL
>        3       3       0
> 
> This could be because of wc.c line 492, where "isprint" is called on each 
> byte (e.g. isprint('\xe2') is false),
> and so these characters are not counted at all?

Yes. You could filter with sed to adjust:

         sed 's/././g' | wc -L    # count chars
LC_ALL=C sed 's/././g' | wc -L    # count bytes

cheers,
Pádraig.
From f410be7915aeaa2478c91c125723009812e57c1d Mon Sep 17 00:00:00 2001
From: Assaf Gordon <[email protected]>
Date: Wed, 13 May 2015 02:46:29 +0100
Subject: [PATCH] doc: clarify the operation of wc -L

* src/wc.c (usage): State that it calculates display width.
* doc/coreutils.texi (wc invocation): Detail the distinct
items used to determine the display width.
---
 doc/coreutils.texi | 5 ++++-
 src/wc.c           | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 51d96b4..6a69b75 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3593,7 +3593,10 @@ Print only the newline counts.
 @itemx --max-line-length
 @opindex -L
 @opindex --max-line-length
-Print only the maximum line lengths.
+Print only the maximum display widths.
+Tabs are set at every 8th column.
+Display widths of wide characters are considered.
+Non-printable characters are given 0 width.
 
 @macro filesZeroFromOption{cmd,withTotalOption,subListOutput}
 @item --files0-from=@var{file}
diff --git a/src/wc.c b/src/wc.c
index ae7ae95..eb7b5b6 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -134,7 +134,7 @@ the following order: newline, word, character, byte, maximum line length.\n\
       --files0-from=F    read input from the files specified by\n\
                            NUL-terminated names in file F;\n\
                            If F is - then read names from standard input\n\
-  -L, --max-line-length  print the length of the longest line\n\
+  -L, --max-line-length  print the maximum display width\n\
   -w, --words            print the word counts\n\
 "), stdout);
       fputs (HELP_OPTION_DESCRIPTION, stdout);
-- 
2.3.4

Reply via email to