Re: 1.5, 1.7: Bash regex not recognizing word boundaries

2009-10-20 Thread Allen Halsey
On Mon, Oct 19, 2009 at 3:30 PM, Eric Blake e...@byu.net wrote:

 Indeed - \b is a GNU extension available in glibc's regcomp(), but not
 required by POSIX nor available in newlib.  Unless/until someone
 contributes patches to write the same extensions to the POSIX interface,
 then bash won't be able to make use of those extensions.  One other option
 would be to ask the upstream bash project if the maintainer would be
 willing to pull in GNU regex.c on platforms where regcomp() is
 POSIX-compliant but lacks GNU extensions.  But it's unfortunately not on
 the top of my priority list.


I see, thank you.

After a more thorough search of the archives, I see the issue of
regcomp not recognizing '\b' as word boundaries came up before:

  http://www.cygwin.com/ml/cygwin/2006-03/msg00362.htm

I was relying on the man page for egrep as my guide to regex syntax.

I'll now stick to the POSIX compliant subset [1]. If I find the need
for more powerful regex, I'll write the script in perl or python.

[1]: http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

Allen

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.5, 1.7: Bash regex not recognizing word boundaries

2009-10-19 Thread Mark J. Reed
On Mon, Oct 19, 2009 at 4:50 PM, Allen Halsey wrote:
These should print Matched, but they don't:

$ REGEX='\bcat\b'
$ [[ dog cat bird =~ $REGEX ]]  echo Matched
$ REGEX='\cat\'
$ [[ dog cat bird =~ $REGEX ]]  echo Matched

It's worth noting that this is not limited to Cygwin; I'm seeing the
same behavior on OS X (with the same version of bash as my Linux
system where the above works as intended).   I suspect it's a factor
of the regex library used to build bash rather than bash itself.

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: 1.5, 1.7: Bash regex not recognizing word boundaries

2009-10-19 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

According to Mark J. Reed on 10/19/2009 6:34 PM:
 On Mon, Oct 19, 2009 at 4:50 PM, Allen Halsey wrote:
 These should print Matched, but they don't:
 
 $ REGEX='\bcat\b'
 $ [[ dog cat bird =~ $REGEX ]]  echo Matched
 $ REGEX='\cat\'
 $ [[ dog cat bird =~ $REGEX ]]  echo Matched
 
 It's worth noting that this is not limited to Cygwin; I'm seeing the
 same behavior on OS X (with the same version of bash as my Linux
 system where the above works as intended).   I suspect it's a factor
 of the regex library used to build bash rather than bash itself.

Indeed - \b is a GNU extension available in glibc's regcomp(), but not
required by POSIX nor available in newlib.  Unless/until someone
contributes patches to write the same extensions to the POSIX interface,
then bash won't be able to make use of those extensions.  One other option
would be to ask the upstream bash project if the maintainer would be
willing to pull in GNU regex.c on platforms where regcomp() is
POSIX-compliant but lacks GNU extensions.  But it's unfortunately not on
the top of my priority list.

- --
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
volunteer cygwin bash maintainer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrdEpgACgkQ84KuGfSFAYAIQgCguZuAheKrgoZuqrgIoNXIWVMy
TUoAn3cnV1S4LYSs590NTXsP7BW9G2NP
=T93e
-END PGP SIGNATURE-

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple