Re: 1.5, 1.7: Bash regex not recognizing word boundaries
On Mon, Oct 19, 2009 at 3:30 PM, Eric Blake e...@byu.net wrote: Indeed - \b is a GNU extension available in glibc's regcomp(), but not required by POSIX nor available in newlib. Unless/until someone contributes patches to write the same extensions to the POSIX interface, then bash won't be able to make use of those extensions. One other option would be to ask the upstream bash project if the maintainer would be willing to pull in GNU regex.c on platforms where regcomp() is POSIX-compliant but lacks GNU extensions. But it's unfortunately not on the top of my priority list. I see, thank you. After a more thorough search of the archives, I see the issue of regcomp not recognizing '\b' as word boundaries came up before: http://www.cygwin.com/ml/cygwin/2006-03/msg00362.htm I was relying on the man page for egrep as my guide to regex syntax. I'll now stick to the POSIX compliant subset [1]. If I find the need for more powerful regex, I'll write the script in perl or python. [1]: http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html Allen -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.5, 1.7: Bash regex not recognizing word boundaries
On Mon, Oct 19, 2009 at 4:50 PM, Allen Halsey wrote: These should print Matched, but they don't: $ REGEX='\bcat\b' $ [[ dog cat bird =~ $REGEX ]] echo Matched $ REGEX='\cat\' $ [[ dog cat bird =~ $REGEX ]] echo Matched It's worth noting that this is not limited to Cygwin; I'm seeing the same behavior on OS X (with the same version of bash as my Linux system where the above works as intended). I suspect it's a factor of the regex library used to build bash rather than bash itself. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Re: 1.5, 1.7: Bash regex not recognizing word boundaries
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Mark J. Reed on 10/19/2009 6:34 PM: On Mon, Oct 19, 2009 at 4:50 PM, Allen Halsey wrote: These should print Matched, but they don't: $ REGEX='\bcat\b' $ [[ dog cat bird =~ $REGEX ]] echo Matched $ REGEX='\cat\' $ [[ dog cat bird =~ $REGEX ]] echo Matched It's worth noting that this is not limited to Cygwin; I'm seeing the same behavior on OS X (with the same version of bash as my Linux system where the above works as intended). I suspect it's a factor of the regex library used to build bash rather than bash itself. Indeed - \b is a GNU extension available in glibc's regcomp(), but not required by POSIX nor available in newlib. Unless/until someone contributes patches to write the same extensions to the POSIX interface, then bash won't be able to make use of those extensions. One other option would be to ask the upstream bash project if the maintainer would be willing to pull in GNU regex.c on platforms where regcomp() is POSIX-compliant but lacks GNU extensions. But it's unfortunately not on the top of my priority list. - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net volunteer cygwin bash maintainer -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrdEpgACgkQ84KuGfSFAYAIQgCguZuAheKrgoZuqrgIoNXIWVMy TUoAn3cnV1S4LYSs590NTXsP7BW9G2NP =T93e -END PGP SIGNATURE- -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple