This seems to fix the issue for me.
OK?
martijn@
On Tue, 2020-06-23 at 19:29 -0700, Jordan Geoghegan wrote:
> Hello,
>
> I was working on a couple POSIX regular expressions to search for and
> validate IPv4 and IPv6 addresses with optional CIDR blocks, and
> encountered some strange behaviour from the base system grep.
>
> I wanted to validate my regex against a list of every valid IPv4
> address, so I generated a list with a zsh 1 liner:
>
> for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done |
> tr '[:space:]' '\n' > IPv4.txt
>
> My intentions were to test the regex by running it with 'grep -c' to
> confirm there was indeed 2^32 addresses matched, and I also wanted to
> benchmark and compare performance between BSD grep, GNU grep and
> ripgrep. The command I used:
>
> grep -Eoc
> "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"
>
> My findings were surprising. Both GNU grep and ripgrep were able get
> through the file in roughly 10 and 20 minutes respectively, whereas the
> base system grep took over 20 hours! What interested me the most was
> that the base system grep when run with '-c' returned '0' for match
> count. It seems that 'grep -c' will have its counter overflow if there
> are more than 2^32-1 matches (4294967295) and then the counter will
> start counting from zero again for further matches.
>
> ryzen$ time zcat IPv4.txt.gz | grep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
> 0
> 1222m09.32s real 1224m28.02s user 1m16.17s system
>
> ryzen$ time zcat allip.txt.gz | ggrep -Eoc "((25[0-5]|(2[0-4]|1{0,1}...
> 4294967296
> 10m00.38s real 11m40.57s user 0m30.55s system
>
> ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}...
> 4294967296
> 21m06.36s real 27m06.04s user 0m50.08s system
>
> # See the counter overflow/reset:
> jot 4294967350 | grep -c "^[[:digit:]]"
> 54
>
> All testing was done on a Ryzen desktop machine running 6.7 stable.
>
> The grep counting bug can be reproduced with this command:
> jot 4294967296 | nice grep -c "^[[:digit:]]"
>
> Regards,
>
> Jordan
>
Index: util.c
===================================================================
RCS file: /cvs/src/usr.bin/grep/util.c,v
retrieving revision 1.62
diff -u -p -r1.62 util.c
--- util.c 3 Dec 2019 09:14:37 -0000 1.62
+++ util.c 24 Jun 2020 06:46:52 -0000
@@ -106,7 +106,8 @@ procfile(char *fn)
{
str_t ln;
file_t *f;
- int c, t, z, nottext;
+ int t, z, nottext;
+ unsigned long long c;
mcount = mlimit;
@@ -169,7 +170,7 @@ procfile(char *fn)
if (cflag) {
if (!hflag)
printf("%s:", ln.file);
- printf("%u\n", c);
+ printf("%llu\n", c);
}
if (lflag && c != 0)
printf("%s\n", fn);