On Tue, May 20, 2025 at 19:08:10 +0000, FunnyMan Computer wrote: > I failed multiple times on getting similar results to what I was > expecting from using grep just using the [a-z] and [a-z]+ classes - > expecting multiple results from $BASH_REMATCH but it's only picking > up 1 character at most, while grep -E is able to pick up all the > characters (which is weird, since the class [a-z]+$ gives completely > similar results).
My first reaction: [a-z] is dangerous. It matches *some* definition of "a character between lowercase a and lowercase z, inclusive", but depending on your locale, this may or may not be equivalent to "the set of lowercase letters in the ASCII character set". If you want [a-z] to work like ASCII does, you'll need to use LC_CTYPE=C. If you want to match lowercase letters in your current locale, you should use [[:lower:]] instead. Moving along.... > So, I was wondering whether this was a bug or intended and I'm just > misinterpreting how bash does regular expressions. I tried reading the bash > manual on the '=~' operator, > Repeat-By: > grep: > `$ echo test-test | POSIXLY_CORRECT=1 grep -E [a-z]` > `^test^-^test^` > > `$ echo test-tesst | POSIXLY_CORRECT=1 grep -E [a-z]+` > `^test^-^tesst^` Second reaction: you forgot to quote the regex. It might match a file in the current working directory and be replaced by the shell. Assuming that there are no matching files in your current directory.... In both examples, you have a single line of input, and the line happens to match the regex you gave. So, grep prints that line. > bash's '=~' and $BASH_REMATCH: > ``` > $ if [[ test-test =~ [a-z] ]]; then > for i in "${!BASH_REMATCH[@]}"; do > echo "$i: ${BASH_REMATCH[$i]}"; > done > fi > ``` > `0: t` You can use "declare -p BASH_REMATCH" to show the array more easily. Now, repeating what appears to be your bug report: > expecting multiple results from $BASH_REMATCH but it's only picking > up 1 character at most, while grep -E is able to pick up all the > characters (which is weird, since the class [a-z]+$ gives completely > similar results). grep always prints whole lines. It doesn't just print the matching part of a line. BASH_REMATCH stores matching parts of the input string. Index 0 stores the whole matching substring, and indexes 1+ store pieces that match parenthesized sub-expressions (which you're not currently using). hobbit:~$ echo "$BASH_VERSION" 5.2.15(1)-release hobbit:~$ [[ test-test =~ [[:lower:]] ]] ; declare -p BASH_REMATCH declare -a BASH_REMATCH=([0]="t") In the above example, [[:lower:]] matches a single lowercase letter in my locale, and the first such letter is 't'. So, that's what gets matched and stored in BASH_REMATCH[0]. hobbit:~$ [[ test-test =~ [[:lower:]]+ ]] ; declare -p BASH_REMATCH declare -a BASH_REMATCH=([0]="test") In the above example, [[:lower:]]+ matches a sequence of one or more lowercase letters in my locale. Regular expressions are always greedy, so it will match as many letters as possible. In this case, the string "test" is matched and stored. hobbit:~$ [[ test-ing =~ [[:lower:]]+$ ]] ; declare -p BASH_REMATCH declare -a BASH_REMATCH=([0]="ing") In the above example, I'm using [[:lower:]]+$ because you mentioned wanting to use [a-z]+$ earlier. I also changed the input string so that we can see whether it's matching the left hand side or the right hand side of the input. In this case, it matches the right hand side.