On 25/04/15 03:38, Assaf Gordon wrote:
> Hello,
>
> Would you be willing to add the following patch, mentioning tab-expansion and
> multibyte counting of '-L'
> in the "--help" screen, and the manual?
> Currently this is mentioned only in one sentence at the end of a long
> paragraph, and is easily missed.
> My wording could be improved, but I hope this will help prevent confusion
> with 'wc -L' output.
Wow that is confusing/ambiguous.
I'll apply the attached in your name.
>
> Somewhat related:
> I seem to get unexpected result with '-L' when forcing C locale.
> Perhaps I'm doing something wrong, or there's more intricate details of '-L' ?
>
> # This is a Unicode Character 'BLACK HEART SUIT' (U+2665)
> $ printf "\xe2\x99\xa5\n"
>
> # counting characters with UTF-8 locale is 1,
> # Counting bytes is 3,
> # longest line is 1 - as expected:
> $ printf "\xe2\x99\xa5" | LC_ALL=en_US.UTF-8 wc -cmL
> 1 3 1
>
>
> # using C locale, characters=bytes=3,
> # but longest line is 0 ?
> $ printf "\xe2\x99\xa5" | LC_ALL=C wc -cmL
> 3 3 0
>
> This could be because of wc.c line 492, where "isprint" is called on each
> byte (e.g. isprint('\xe2') is false),
> and so these characters are not counted at all?
Yes. You could filter with sed to adjust:
sed 's/././g' | wc -L # count chars
LC_ALL=C sed 's/././g' | wc -L # count bytes
cheers,
Pádraig.
From f410be7915aeaa2478c91c125723009812e57c1d Mon Sep 17 00:00:00 2001
From: Assaf Gordon <[email protected]>
Date: Wed, 13 May 2015 02:46:29 +0100
Subject: [PATCH] doc: clarify the operation of wc -L
* src/wc.c (usage): State that it calculates display width.
* doc/coreutils.texi (wc invocation): Detail the distinct
items used to determine the display width.
---
doc/coreutils.texi | 5 ++++-
src/wc.c | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 51d96b4..6a69b75 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3593,7 +3593,10 @@ Print only the newline counts.
@itemx --max-line-length
@opindex -L
@opindex --max-line-length
-Print only the maximum line lengths.
+Print only the maximum display widths.
+Tabs are set at every 8th column.
+Display widths of wide characters are considered.
+Non-printable characters are given 0 width.
@macro filesZeroFromOption{cmd,withTotalOption,subListOutput}
@item --files0-from=@var{file}
diff --git a/src/wc.c b/src/wc.c
index ae7ae95..eb7b5b6 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -134,7 +134,7 @@ the following order: newline, word, character, byte, maximum line length.\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
- -L, --max-line-length print the length of the longest line\n\
+ -L, --max-line-length print the maximum display width\n\
-w, --words print the word counts\n\
"), stdout);
fputs (HELP_OPTION_DESCRIPTION, stdout);
--
2.3.4