Potential grep bug?

Jordan Geoghegan Tue, 23 Jun 2020 19:31:31 -0700

Hello,

I was working on a couple POSIX regular expressions to search for andvalidate IPv4 and IPv6 addresses with optional CIDR blocks, andencountered some strange behaviour from the base system grep.

I wanted to validate my regex against a list of every valid IPv4address, so I generated a list with a zsh 1 liner:

for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done |tr '[:space:]' '\n' > IPv4.txt

My intentions were to test the regex by running it with 'grep -c' toconfirm there was indeed 2^32 addresses matched, and I also wanted tobenchmark and compare performance between BSD grep, GNU grep andripgrep. The command I used:

grep -Eoc"((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"

My findings were surprising. Both GNU grep and ripgrep were able getthrough the file in roughly 10 and 20 minutes respectively, whereas thebase system grep took over 20 hours! What interested me the most wasthat the base system grep when run with '-c' returned '0' for matchcount. It seems that 'grep -c' will have its counter overflow if thereare more than 2^32-1 matches (4294967295) and then the counter willstart counting from zero again for further matches.


    ryzen$ time zcat IPv4.txt.gz | grep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
    0
    1222m09.32s real  1224m28.02s user     1m16.17s system

    ryzen$ time zcat allip.txt.gz | ggrep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
    4294967296
    10m00.38s real    11m40.57s user     0m30.55s system

    ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}...
    4294967296
    21m06.36s real    27m06.04s user     0m50.08s system

# See the counter overflow/reset:
    jot 4294967350 | grep -c "^[[:digit:]]"
    54

All testing was done on a Ryzen desktop machine running 6.7 stable.

The grep counting bug can be reproduced with this command:
   jot 4294967296 | nice grep -c "^[[:digit:]]"

Regards,

Jordan

Potential grep bug?

Reply via email to