Configuration Information [Automatically generated, do not change]:
Machine: i386
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i386'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i386-pc-linux-gnu'
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL
-DHAVE_CONFIG_H -I. -I../bash -I../bash/include -I../bash/lib -g -O2
uname output: Linux tazzelwurm 2.6.11hcz1 #2 Fri Mar 11 20:01:21 CET 2005 i686
GNU/Linux
Machine Type: i386-pc-linux-gnu
Bash Version: 3.0
Patch Level: 16
Release Status: release
Description: If a string contains an invalid utf8 sequence, its size
is reported by ${#var} as the number of characters from start
up to the character preceding it.
This way you can construct a string which is handled as
non-empty by "test -n" and "test -z", but is reported by
${#var} as having zero size.
Repeat-By:
x=$'\xff'foobar
LC_ALL=C
echo ${#x}
# reports: 7
LC_ALL=en_US.utf-8
echo ${#x}
# reports: 0
[ -n "$x" ] && echo non-empty
# echoes: non-empty
x=baz$'\xff'foobar
LC_ALL=en_US.utf-8
echo ${#x}
# reports: 3
Fix:
I understand that - strictly speaking - this is undefined
behavior, but I'd suggest not stopping to count when an
invalid multibyte sequence is encountered, but to count it by
its number of bytes (or by 1), since the string is definitely
non-empty.
Thanks,
Heike
_______________________________________________
Bug-bash mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-bash