-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
$ cat <<\EOF | m4
translit(`«abc~', `~-»')
EOF
«
Oops - ranges that extended across the 0x7f-0x80 boundary misbehaved on
machines where char is signed. Also, our testsuite assumes ASCII in the
translit tests, but so far no one has reported failures when porting to
EBCDIC platforms (where A-Z is more than just 26 letters), so I doubt it
is worth worrying about.
2006-11-11 Eric Blake <[EMAIL PROTECTED]>
* src/builtin.c: Remove unnecessary casts.
(expand_ranges): Make 8-bit clean.
* doc/m4.texinfo (Translit): Add tests and wording.
* NEWS: Document this fix.
- --
Life is short - so eat dessert first!
Eric Blake [EMAIL PROTECTED]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFVcgC84KuGfSFAYARAqwsAKC16u8NpG48in0OOMQslWt66JxO9QCguVHT
wpE2vBw5R1xMMN431yt6WE4=
=fE0O
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.79
diff -u -p -r1.1.1.1.2.79 NEWS
--- NEWS 1 Nov 2006 13:44:53 -0000 1.1.1.1.2.79
+++ NEWS 11 Nov 2006 12:48:54 -0000
@@ -43,7 +43,8 @@ Version 1.4.8 - ?? ??? 2006, by ?? (CVS
* The `changecom' and `changequote' macros now treat an empty second
argument the same as if it were missing, rather than using the empty
string and making it impossible to end a comment or quote.
-* The `translit' macro now operates in linear instead of quadratic time.
+* The `translit' macro now operates in linear instead of quadratic time,
+ and is now eight-bit clean.
* The `-D', `-U', `-s', and `-t' command line options now take effect
after any files encountered earlier on the command line, rather than up
front, as is done in traditional implementations and required by POSIX.
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.99
diff -u -p -r1.1.1.1.2.99 m4.texinfo
--- doc/m4.texinfo 8 Nov 2006 05:08:26 -0000 1.1.1.1.2.99
+++ doc/m4.texinfo 11 Nov 2006 12:48:56 -0000
@@ -2828,9 +2828,9 @@ foo
The quotation strings can safely contain eight-bit characters.
@ignore
-Yuck. I know of no clean way to render an 8-bit character in both info
-and dvi. This example uses the `open-guillemot' and `close-guillemot'
-characters of the Latin-1 character set.
[EMAIL PROTECTED] Yuck. I know of no clean way to render an 8-bit character in
[EMAIL PROTECTED] both info and dvi. This example uses the `open-guillemot' and
[EMAIL PROTECTED] `close-guillemot' characters of the Latin-1 character set.
@example
define(`a', `b')
@@ -3058,9 +3058,9 @@ changecom(`#', `')
The comment strings can safely contain eight-bit characters.
@ignore
-Yuck. I know of no clean way to render an 8-bit character in both info
-and dvi. This example uses the `open-guillemot' and `close-guillemot'
-characters of the Latin-1 character set.
[EMAIL PROTECTED] Yuck. I know of no clean way to render an 8-bit character in
[EMAIL PROTECTED] both info and dvi. This example uses the `open-guillemot' and
[EMAIL PROTECTED] `close-guillemot' characters of the Latin-1 character set.
@example
define(`a', `b')
@@ -4134,14 +4134,15 @@ translation pass is made, even if charac
appear in @var{chars}.
As a @acronym{GNU} extension, both @var{chars} and @var{replacement} can
-contain character-ranges,
-e.g., @samp{a-z} (meaning all lowercase letters) or @samp{0-9} (meaning
-all digits). To include a dash @samp{-} in @var{chars} or
[EMAIL PROTECTED], place it first or last.
-
-It is not an error for the last character in the range to be `larger'
-than the first. In that case, the range runs backwards, i.e.,
[EMAIL PROTECTED] means the string @samp{9876543210}.
+contain character-ranges, e.g., @samp{a-z} (meaning all lowercase
+letters) or @samp{0-9} (meaning all digits). To include a dash @samp{-}
+in @var{chars} or @var{replacement}, place it first or last in the
+entire string, or as the last character of a range. Back-to-back ranges
+can share a common endpoint. It is not an error for the last character
+in the range to be `larger' than the first. In that case, the range
+runs backwards, i.e., @samp{9-0} means the string @samp{9876543210}.
+The expansion of a range is dependent on the underlying encoding of
+characters, so using ranges is not always portable between machines.
The macro @code{translit} is recognized only with parameters.
@end deffn
@@ -4153,17 +4154,31 @@ translit(`GNUs not Unix', `a-z', `A-Z')
@result{}GNUS NOT UNIX
translit(`GNUs not Unix', `A-Z', `z-a')
@result{}tmfs not fnix
+translit(`+,-12345', `+--1-5', `<;>a-c-a')
[EMAIL PROTECTED]<;>abcba
translit(`abcdef', `aabdef', `bcged')
@result{}bgced
@end example
-The first example deletes all uppercase letters, the second converts
-lowercase to uppercase, and the third `mirrors' all uppercase letters,
-while converting them to lowercase. The two first cases are by far the
-most common. The final example shows that @samp{a} is mapped to
[EMAIL PROTECTED], not @samp{c}; the resulting @samp{b} is not further remapped
-to @samp{g}; the @samp{d} and @samp{e} are swapped, and the @samp{f} is
-discarded.
+In the @sc{ascii} encoding, the first example deletes all uppercase
+letters, the second converts lowercase to uppercase, and the third
+`mirrors' all uppercase letters, while converting them to lowercase.
+The two first cases are by far the most common, even though they are not
+portable to @sc{ebcdic} or other encodings. The fourth example shows a
+range ending in @samp{-}, as well as back-to-back ranges. The final
+example shows that @samp{a} is mapped to @samp{b}, not @samp{c}; the
+resulting @samp{b} is not further remapped to @samp{g}; the @samp{d} and
[EMAIL PROTECTED] are swapped, and the @samp{f} is discarded.
+
[EMAIL PROTECTED]
[EMAIL PROTECTED] No need to fight 8-bit characters, as it is difficult to get
[EMAIL PROTECTED] rendering right in both info and dvi.
+
[EMAIL PROTECTED]
+translit(`«abc~', `~-»')
[EMAIL PROTECTED]
[EMAIL PROTECTED] example
[EMAIL PROTECTED] ignore
Omitting @var{chars} evokes a warning, but still produces output.
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.50
diff -u -p -r1.1.1.1.2.50 builtin.c
--- src/builtin.c 1 Nov 2006 22:29:08 -0000 1.1.1.1.2.50
+++ src/builtin.c 11 Nov 2006 12:48:56 -0000
@@ -359,12 +359,12 @@ numeric_arg (token_data *macro, const ch
static char const digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
static const char *
-ntoa (register eval_t value, int radix)
+ntoa (eval_t value, int radix)
{
bool negative;
unsigned_eval_t uvalue;
static char str[256];
- register char *s = &str[sizeof str];
+ char *s = &str[sizeof str];
*--s = '\0';
@@ -667,9 +667,9 @@ m4_dumpdef (struct obstack *obs, int arg
/* Make table of symbols invisible to expand_macro (). */
- (void) obstack_finish (obs);
+ obstack_finish (obs);
- qsort ((char *) data.base, data.size, sizeof (symbol *), dumpdef_cmp);
+ qsort (data.base, data.size, sizeof (symbol *), dumpdef_cmp);
for (; data.size > 0; --data.size, data.base++)
{
@@ -1645,14 +1645,14 @@ m4_substr (struct obstack *obs, int argc
static const char *
expand_ranges (const char *s, struct obstack *obs)
{
- char from;
- char to;
+ unsigned char from;
+ unsigned char to;
- for (from = '\0'; *s != '\0'; from = *s++)
+ for (from = '\0'; *s != '\0'; from = to_uchar (*s++))
{
if (*s == '-' && from != '\0')
{
- to = *++s;
+ to = to_uchar (*++s);
if (to == '\0')
{
/* trailing dash */
@@ -1772,7 +1772,7 @@ static void
substitute (struct obstack *obs, const char *victim, const char *repl,
struct re_registers *regs)
{
- register unsigned int ch;
+ int ch;
for (;;)
{
@@ -2031,7 +2031,7 @@ void
expand_user_macro (struct obstack *obs, symbol *sym,
int argc, token_data **argv)
{
- register const char *text;
+ const char *text;
int i;
for (text = SYMBOL_TEXT (sym); *text != '\0';)
_______________________________________________
M4-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/m4-patches