On 2020-06-23 22:29, Jordan Geoghegan wrote:
> Hello,
>
> I was working on a couple POSIX regular expressions to search for and
> validate IPv4 and IPv6 addresses with optional CIDR blocks, and encountered
> some strange behaviour from the base system grep.
>
> I wanted to validate my regex against a list of every valid IPv4 address, so
> I generated a list with a zsh 1 liner:
>
> for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | tr
> '[:space:]' '\n' > IPv4.txt
>
> My intentions were to test the regex by running it with 'grep -c' to confirm
> there was indeed 2^32 addresses matched, and I also wanted to benchmark and
> compare performance between BSD grep, GNU grep and ripgrep. The command I
> used:
>
> grep -Eoc
> "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"
>
> My findings were surprising. Both GNU grep and ripgrep were able get through
> the file in roughly 10 and 20 minutes respectively, whereas the base system
> grep took over 20 hours! What interested me the most was that the base system
> grep when run with '-c' returned '0' for match count. It seems that 'grep -c'
> will have its counter overflow if there are more than 2^32-1 matches
> (4294967295) and then the counter will start counting from zero again for
> further matches.Does OpenBSD use an NFA/DFA regular expression implementation, or a backtracking one? If it uses the latter, then your regex is probably causing catastrophic backtracking. > Regards, > > Jordan Sincerely, Demi
signature.asc
Description: OpenPGP digital signature

