Hello,

I was hoping the fine folks here could give me a quick sanity check, I'm by no means an awk guru, so I'm likely missing something obvious. I wanted to ask here quickly before I started flapping my gums on bugs@.

I'm working on a simple awk snippet to convert the IP range data listed in the Extended Delegation Statistics data from ARIN [1] and convert it into CIDR blocks. I have a snippet that works perfectly fine on mawk and gawk, but not on the base system awk. I'm 99% sure I'm not using any GNUisms, as when I break the command up into two parts, it works perfectly.

The snippet below does not work with base awk, but does work with gawk and mawk: (Running on 6.6 -stable system)

  awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt


The command does output data, but it also throws errors for certain lines:

  awk: log result out of range
  input record number 94027, file delegated-arin-extended-latest.txt
  source line number 1

Most CIDR blocks are calculated correctly, but about 10% of them have errors (ie something that should calculated to be a /24 is instead calculated to be a /30).

However, when I break it up into two parts, it produces the expected output:

  awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") print($4, $5)}' delegated-arin-extended-latest.txt | awk  '{printf("%s/%d\n", $1, 32-log($2)/log(2)) }'

As you can see, the same number of lines are printed, but the hashes are different.

  luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
     56446
  luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
     56446
  luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-*-latest.txt 2>/dev/null | wc -l
     56446

  luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null | md5
    6f549bbc0799bc202c12695f8530d1df
  luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null | md5
    40c28b8ebfd2796e1ae15d9f6401c0c1
  luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null | md5
    40c28b8ebfd2796e1ae15d9f6401c0c1


Example of the differences:

--- mawk.txt    Sat Jun  6 18:43:30 2020
+++ awk.txt     Sat Jun  6 18:43:38 2020
@@ -29,7 +29,7 @@
 9.64.0.0/10
 9.128.0.0/9
 11.0.0.0/8
-12.0.0.0/8
+12.0.0.0/30
 13.0.0.0/11
 13.32.0.0/12
 13.48.0.0/14
@@ -415,7 +415,7 @@
 23.90.64.0/20
 23.90.80.0/21
 23.90.88.0/22
-23.90.92.0/22
+23.90.92.0/30
 23.90.96.0/19
 23.91.0.0/19
 23.91.32.0/19
@@ -545,8 +545,8 @@
 23.133.224.0/24
 23.133.240.0/24
 23.134.0.0/24
-23.134.16.0/24
-23.134.17.0/24
+23.134.16.0/30
+23.134.17.0/30


Any insight or advice would be much appreciated.

Regards,

Jordan

[1] https://ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest


Reply via email to