Hello,
Would you be willing to add the following patch, mentioning tab-expansion and
multibyte counting of '-L'
in the "--help" screen, and the manual?
Currently this is mentioned only in one sentence at the end of a long
paragraph, and is easily missed.
My wording could be improved, but I hope this will help prevent confusion with
'wc -L' output.
Somewhat related:
I seem to get unexpected result with '-L' when forcing C locale.
Perhaps I'm doing something wrong, or there's more intricate details of '-L' ?
# This is a Unicode Character 'BLACK HEART SUIT' (U+2665)
$ printf "\xe2\x99\xa5\n"
# counting characters with UTF-8 locale is 1,
# Counting bytes is 3,
# longest line is 1 - as expected:
$ printf "\xe2\x99\xa5" | LC_ALL=en_US.UTF-8 wc -cmL
1 3 1
# using C locale, characters=bytes=3,
# but longest line is 0 ?
$ printf "\xe2\x99\xa5" | LC_ALL=C wc -cmL
3 3 0
This could be because of wc.c line 492, where "isprint" is called on each byte
(e.g. isprint('\xe2') is false),
and so these characters are not counted at all?
thanks,
- assaf
>From 74b3d15948a86dd1aaff13529d9e7a62417e438f Mon Sep 17 00:00:00 2001
From: Assaf Gordon <[email protected]>
Date: Fri, 24 Apr 2015 22:18:41 -0400
Subject: [PATCH] wc: expand usage text of '-L' option
* src/wc.c: usage() mention tab-expansion and multibyte counting.
* doc/coreutils.texi: mention tab-expansion and multibyte counting under
'-L' option, and provide examples.
---
doc/coreutils.texi | 12 ++++++++++++
src/wc.c | 3 ++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 51d96b4..2e9d33c 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3594,6 +3594,18 @@ Print only the newline counts.
@opindex -L
@opindex --max-line-length
Print only the maximum line lengths.
+Tab characters are assumed to align to every 8th position.
+Depending on the current locale, multibyte characters might be counted as
+consuming one character.
+
+For example, a 3-bytes UTF-8 character is counted as one character,
+and a tab is aligned to the nearest 8th column position:
+@example
+$ printf "\xe2\x99\xa5\n" | LC_ALL=en_US.UTF-8 wc -L
+1
+$ printf "a\tb\n" | wc -L
+9
+@end example
@macro filesZeroFromOption{cmd,withTotalOption,subListOutput}
@item --files0-from=@var{file}
diff --git a/src/wc.c b/src/wc.c
index fe73d2c..5955aaf 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -129,7 +129,8 @@ the following order: newline, word, character, byte, maximum line length.\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
- -L, --max-line-length print the length of the longest line\n\
+ -L, --max-line-length print the length of the longest line in screen\n\
+ columns (counting tabs and multi-byte characters)\n\
-w, --words print the word counts\n\
"), stdout);
fputs (HELP_OPTION_DESCRIPTION, stdout);
--
1.9.1