Hello,
I was hoping the fine folks here could give me a quick sanity check, I'm
by no means an awk guru, so I'm likely missing something obvious. I
wanted to ask here quickly before I started flapping my gums on bugs@.
I'm working on a simple awk snippet to convert the IP range data listed
in the Extended Delegation Statistics data from ARIN [1] and convert it
into CIDR blocks. I have a snippet that works perfectly fine on mawk and
gawk, but not on the base system awk. I'm 99% sure I'm not using any
GNUisms, as when I break the command up into two parts, it works perfectly.
The snippet below does not work with base awk, but does work with gawk
and mawk: (Running on 6.6 -stable system)
awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4,
32-log($5)/log(2))}' delegated-arin-extended-latest.txt
The command does output data, but it also throws errors for certain lines:
awk: log result out of range
input record number 94027, file delegated-arin-extended-latest.txt
source line number 1
Most CIDR blocks are calculated correctly, but about 10% of them have
errors (ie something that should calculated to be a /24 is instead
calculated to be a /30).
However, when I break it up into two parts, it produces the expected output:
awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") print($4, $5)}'
delegated-arin-extended-latest.txt | awk '{printf("%s/%d\n", $1,
32-log($2)/log(2)) }'
As you can see, the same number of lines are printed, but the hashes are
different.
luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
56446
luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
56446
luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt
2>/dev/null | wc -l
56446
luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}'
delegated-arin-extended-latest.txt 2>/dev/null | md5
6f549bbc0799bc202c12695f8530d1df
luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}'
delegated-arin-extended-latest.txt 2>/dev/null | md5
40c28b8ebfd2796e1ae15d9f6401c0c1
luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US")
printf("%s/%d\n", $4, 32-log($5)/log(2))}'
delegated-arin-extended-latest.txt 2>/dev/null | md5
40c28b8ebfd2796e1ae15d9f6401c0c1
Example of the differences:
--- mawk.txt Sat Jun 6 18:43:30 2020
+++ awk.txt Sat Jun 6 18:43:38 2020
@@ -29,7 +29,7 @@
9.64.0.0/10
9.128.0.0/9
11.0.0.0/8
-12.0.0.0/8
+12.0.0.0/30
13.0.0.0/11
13.32.0.0/12
13.48.0.0/14
@@ -415,7 +415,7 @@
23.90.64.0/20
23.90.80.0/21
23.90.88.0/22
-23.90.92.0/22
+23.90.92.0/30
23.90.96.0/19
23.91.0.0/19
23.91.32.0/19
@@ -545,8 +545,8 @@
23.133.224.0/24
23.133.240.0/24
23.134.0.0/24
-23.134.16.0/24
-23.134.17.0/24
+23.134.16.0/30
+23.134.17.0/30
Any insight or advice would be much appreciated.
Regards,
Jordan
[1] https://ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest