Jim Meyering wrote: > Jim Meyering wrote: >> The fact that a Turkish I-with-dot (U+0130) on a matched line of input >> can make "grep -i" generate corrupt output in nearly any UTF-8 locale is >> pretty serious, so I want to make a bug-fix release. >> >> Does anyone have a pending change or a bug report that we should >> consider first? > > For reference, here are the pending NEWS entries: > > ** Bug fixes > > grep -i, in a multi-byte locale, when matching a line containing a character > like the UTF-8 Turkish I-with-dot (U+0130) (whose lower-case representation > occupies fewer bytes), would print an incomplete output line. > Similarly, with a matched line containing a character (e.g., the Latin > capital I in a Turkish UTF-8 locale), where the lower-case representation > occupies more bytes, grep could print garbage. > [bug introduced in grep-2.6] > > --include and --exclude can again be combined, and again apply to > the command line, e.g., "grep --include='*.[ch]' --exclude='system.h' > PATTERN *" again reads all *.c and *.h files except for system.h. > [bug introduced in grep-2.6] > > ** New features > > 'grep' without -z now treats a sparse file as binary, if it can > easily determine that the file is sparse. > > ** Dropped features > > Bootstrapping with Makefile.boot has been broken since grep 2.6, > and was removed.
No feedback, so I'm preparing for the release, probably tomorrow. To that end, here's an added test and a gnulib/bootstrap update: >From d40bc5fd419f55dc9a1ce9d8dcd811fbc592587b Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Sun, 17 Jun 2012 19:03:40 +0200 Subject: [PATCH 1/2] tests: add another turkish-I-related test case * tests/turkish-I-without-dot: Also exercise the case in which the original string and the lower-case buffer have precisely the same length (22 bytes here), yet internal offsets do differ. --- tests/turkish-I-without-dot | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tests/turkish-I-without-dot b/tests/turkish-I-without-dot index cab4ede..9f92502 100755 --- a/tests/turkish-I-without-dot +++ b/tests/turkish-I-without-dot @@ -41,7 +41,15 @@ fail=0 printf "IIIIIII\n" > in || framework_failure_ LC_ALL=tr_TR.utf8 grep -i .... in > out || fail=1 +compare out in || fail=1 +# Also exercise the case in which the original string and the lower-case +# buffer have precisely the same length (22 bytes here), yet internal +# offsets do differ. Lengths are the same because while some bytes shrink +# when converted to lower case, others grow, and here they balance out. +i='I\xC4\xB0' +printf "$i$i$i$i$i$i$i\n" > in || framework_failure_ +LC_ALL=tr_TR.utf8 grep -i .... in > out || fail=1 compare out in || fail=1 Exit $fail -- 1.7.11.1.104.ge7b44f1 >From cf8005879d77fdf64d5f2b2368eb9e4d96c556b5 Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Tue, 3 Jul 2012 14:53:25 +0200 Subject: [PATCH 2/2] build: update gnulib submodule, bootstrap, init.sh --- bootstrap | 11 ++++++----- gnulib | 2 +- tests/init.sh | 5 ++--- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/bootstrap b/bootstrap index ce37a2c..e984910 100755 --- a/bootstrap +++ b/bootstrap @@ -1,6 +1,6 @@ #! /bin/sh # Print a version string. -scriptversion=2012-05-15.06; # UTC +scriptversion=2012-07-03.20; # UTC # Bootstrap this package from checked-out sources. @@ -215,7 +215,7 @@ find_tool () eval "export $find_tool_envvar" } -# Find sha1sum, named gsha1sum on MacPorts, and shasum on MacOS 10.6. +# Find sha1sum, named gsha1sum on MacPorts, and shasum on Mac OS X 10.6. find_tool SHA1SUM sha1sum gsha1sum shasum # Override the default configuration, if necessary. @@ -230,7 +230,6 @@ esac test -z "${gnulib_extra_files}" && \ gnulib_extra_files=" $build_aux/install-sh - $build_aux/missing $build_aux/mdate-sh $build_aux/texinfo.tex $build_aux/depcomp @@ -855,7 +854,8 @@ echo "$0: $gnulib_tool $gnulib_tool_options --import ..." $gnulib_tool $gnulib_tool_options --import $gnulib_modules && for file in $gnulib_files; do - symlink_to_dir "$GNULIB_SRCDIR" $file || exit + symlink_to_dir "$GNULIB_SRCDIR" $file \ + || { echo "$0: failed to symlink $file" 1>&2; exit 1; } done bootstrap_post_import_hook \ @@ -896,7 +896,8 @@ for file in $gnulib_extra_files; do build-aux/*) dst=$build_aux/${file#build-aux/};; *) dst=$file;; esac - symlink_to_dir "$GNULIB_SRCDIR" $file $dst || exit + symlink_to_dir "$GNULIB_SRCDIR" $file $dst \ + || { echo "$0: failed to symlink $file" 1>&2; exit 1; } done if test $with_gettext = yes; then diff --git a/gnulib b/gnulib index f6c2431..d0ea2a1 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit f6c2431e873d1c9972f97cb610ab26491d626410 +Subproject commit d0ea2a12f6fb377f930886d404f3dfc2a732537d diff --git a/tests/init.sh b/tests/init.sh index f525a7c..5f6e638 100644 --- a/tests/init.sh +++ b/tests/init.sh @@ -411,8 +411,7 @@ path_prepend_ () case $path_dir_ in '') fail_ "invalid path dir: '$1'";; /*) abs_path_dir_=$path_dir_;; - *) abs_path_dir_=`cd "$initial_cwd_/$path_dir_" && echo "$PWD"` \ - || fail_ "invalid path dir: $path_dir_";; + *) abs_path_dir_=$initial_cwd_/$path_dir_;; esac case $abs_path_dir_ in *:*) fail_ "invalid path dir: '$abs_path_dir_'";; @@ -448,7 +447,7 @@ setup_ () pfx_=`testdir_prefix_` test_dir_=`mktempd_ "$initial_cwd_" "$pfx_-$ME_.XXXX"` \ || fail_ "failed to create temporary directory in $initial_cwd_" - cd "$test_dir_" + cd "$test_dir_" || fail_ "failed to cd to temporary directory" # As autoconf-generated configure scripts do, ensure that IFS # is defined initially, so that saving and restoring $IFS works. -- 1.7.11.1.104.ge7b44f1
