2011-10-1, 14:39(-08), rogerx....@gmail.com: [...] > I took some time to examine the three regex references: > > 1) > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 > Written more like a technical specification of regex. Great if your're > going to be modifying the regex code. Difficult to follow if you're new, > looking for info.
One thing to bear in mind is that bash calls a system library to perform the regexp expansion (except that [*]), so it can't really document how it's gonna work because it just can't know, it may differ from system to system. The only thing that is more or less guaranteed is that all those various implementation should comply to that specification. Above is the specification of the POSIX extended regular expression, so a bash script writer should refer to that document if he want to write a script for all the systems where bash might be used. > 2) regex(7) > Although it looks good, upon further examination, I start to see run-on > sentences. It's more like a reference, which is what a man file should > be. > At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex > package" On the few systems where that man page is available, it may or may not document the extended regular expressions that are used when calling the regex(3) API (on my system, it doesn't). Those regular expressions may or may not have extensions over the POSIX API, and that document may or may not point out which ones are extensions and which one are not, so a script writer may be able to refer to that document if he wants his script to work on that particular system (except that [*]). > 3) grep(1) > Section "REGULAR EXPRESSIONS". At about half the size of regex(7), the > section clearly explains regex and seems to be easily understandable for a > person new to regex. That's another utility that may or may not use the same API, in the same way as bash or not. You get no warranty whatsoever that the regexps covered there will be the same as bash's. [*] actually, bash does some (undocumented) preprocessing on the regexps, so even the regex(3) reference is misleading here. For instance, on my system the regex(3) Extended REs support \1 for backreference, \b for word boundary, but when calling [[ aa =~ (.)\1 ]], bash changes it to [[ aa =~ (.)1 ]] (note that (.)\1 is not a portable regex as the behavior is unspecified) bash won't behave as regex(3) documenta on my system. Also (and that could be considered a bug), "[\a]" is meant to match either "\" or "a", but in bash, because of that preprocessing, it doesn't: $ bash -c '[[ "\\" =~ [\a] ]]' || echo no no $ bash -c '[[ "\\" =~ [\^] ]]' && echo yes yes Once that bug is fixed, bash should probably refer to POSIX EREs (since its preprocessing would disable any extension introduced by system libraries) rather than regex(3), as that would be more accurate. The situation with zsh: - it uses the same API as bash (unless the RE_MATCH_PCRE option is set in which case it uses PCRE regexps) - it doesn't do the same preprocessing as bash because... - it doesn't implement that confusing business inherited from ksh whereby quotes RE characters are taken literally. So, in zsh - [[ aa =~ '(.)\1' ]] works as documented in regex(3) on my system (but may work differently on other systems as the behavior is unspecified as per POSIX). - [[ '\' =~ '[\a]' ]] works as POSIX specifies - after "setopt RE_MATCH_PCRE", one gets a more portable behavior as there is only one PCRE library (thouh different versions). The situation with ksh93: - Not POSIX either but a bit more consistent: $ ksh -c '[[ "\\" =~ [\a] ]]' || echo no no $ ksh -c '[[ "\\" =~ [\^] ]]' || echo no no - it implements its own regexps with its own many extensions which therefore can be and are documented in its man page but are not common to any other regex (though are mostly a superset of the POSIX ERE). -- Stephane