[Try again: Misspelt the mailing list name the first time] ------- Forwarded Message
Date: Fri, 08 Oct 2010 13:57:43 -0400 From: Nick Dokos <[email protected]> To: [email protected] cc: [email protected], "Eric S. Fraga" <[email protected]>, Suvayu Ali <[email protected]> Subject: texi2dvi egrep regexp There was a discussion about some problems with the egrep regexp that texi2dvi uses back in March 2010 in the thread entitled texi2dvi: locale-dependent error in egrep [A-z] (see http://lists.gnu.org/archive/html/bug-texinfo/2010-03/msg00031.html and following). Has anything come of that? The reason I am asking is that recently emacs org-mode tried to switch to texi2dvi for org->pdf exporting and several people have reported this problem. The underlying reason seems to be that recent versions of egrep check range expressions more strictly: e.g. Fedora 13 uses grep version 2.6.3 and egrep fails the range check. OTOH, Ubuntu 10.04 uses grep version 2.5.4: egrep does not fail there. The egrep manual page says: Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale=E2=80=99s collating sequence and character set. For exam= ple, in the default C locale, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is typically not equivalent to [abcd]; it might be equivalent to [aBbCcDd], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C. Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means [0-9A-Za-z], except the latter form depends upon the C locale and the ASCII character encoding, whereas the former is independent of locale and character set. (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last. Given that, would it make sense to replace the egrep invocation in texi2dvi with egrep '^(/|[:alpha:]:/)' which would be valid under any locale? It does not include the ASCII characters between 'Z' and 'a', which (I was surprised to find out from Eli's response) could be drive letters, but as Eli also points out, those are probably never used nowadays. Thanks, Nick ------- End of Forwarded Message
