On 03/01/2014 03:07 PM, Paul Eggert wrote:
Eric Blake wrote:
It might help to mention all three characters in the NEWS blurb.
Thanks, I pushed the attached patch.
I see now that my documentation fix went too far, as it promised
behavior that the regex code does not in fact implement. The plan is to
fix the DFA code to match what the regex code does, and the first step
is to remove the promises that aren't being kept now (when the regex
code is used). I pushed the attach documentation patch.
>From a612c38282c056f6535fa50f6b5de69d83030777 Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Thu, 6 Mar 2014 13:16:48 -0800
Subject: [PATCH] doc: do not overpromise --ignore-case's behavior
* NEWS: Omit vague statement about titlecase that could be
misinterpreted, and is more trouble than it's worth.
* doc/grep.texi: Add @documentencoding. Fix copyright range to
use endash not hyphen.
(Matching Control): Do not overpromise what --ignore-case will do.
Give examples of corner cases where the documentation does not
specify behavior.
---
NEWS | 2 --
doc/grep.texi | 19 ++++++++++++++++---
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/NEWS b/NEWS
index eaa3c96..2a62e7b 100644
--- a/NEWS
+++ b/NEWS
@@ -20,8 +20,6 @@ GNU grep NEWS -*- outline -*-
per the documentation on how grep's -w works.
grep -i no longer mishandles patterns containing titlecase characters.
- A pattern character now matches its lowercase and uppercase
- counterparts, even when they both differ from the pattern.
For example, in a locale containing the titlecase character
'Ç' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
'grep -i Ç' now matches both 'Ç' (U+01C7 LATIN CAPITAL LETTER LJ)
diff --git a/doc/grep.texi b/doc/grep.texi
index b9d6ec5..f631f03 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -13,10 +13,12 @@
@syncodeindex vr cp
@c %**end of header
+@documentencoding UTF-8
+
@copying
This manual is for @command{grep}, a pattern matching engine.
-Copyright @copyright{} 1999-2002, 2005, 2008-2014 Free Software Foundation,
+Copyright @copyright{} 1999--2002, 2005, 2008--2014 Free Software Foundation,
Inc.
@quotation
@@ -193,8 +195,19 @@ The empty file contains zero patterns, and therefore matches nothing.
@opindex -y
@opindex --ignore-case
@cindex case insensitive search
-Ignore case distinctions, so that a pattern character matches not only
-itself in the text, but also its lowercase and uppercase counterparts, if any.
+Ignore case distinctions, so that characters that differ only in case
+match each other. Although this is straightforward when letters
+differ in case only via lowercase-uppercase pairs, the behavior is
+unspecified in other situations. For example, uppercase ``S'' has an
+unusual lowercase counterpart ``Å¿'' (Unicode character U+017F, LATIN
+SMALL LETTER LONG S) in many locales, and it is unspecified whether
+this unusual character matches ``S'' or ``s'' even though uppercasing
+it yields ``S''. Another example: the lowercase German letter ``Ã''
+(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the
+two-character string ``SS'' but it does not match ``SS'', and it might
+not match the uppercase letter ``áº'' (U+1E9E, LATIN CAPITAL LETTER
+SHARP S) even though lowercasing the latter yields the former.
+
@option{-y} is an obsolete synonym that is provided for compatibility.
(@option{-i} is specified by POSIX.)
--
1.8.5.3