cgivre commented on issue #1637: DRILL-7032: Ignore corrupt rows in a PCAP file
URL: https://github.com/apache/drill/pull/1637#issuecomment-462151344
 
 
   Actually, some good news here…  
   I ran some test queries on the corrupted file and it seemed to work pretty 
well.  I didn’t get any exceptions!
   ```
    jdbc:drill:zk=local> select src_ip, COUNT(*) as packet_count from 
dfs.test.`testv1.pcap`WHERE is_corrupt=1 GROUP BY src_ip ORDER BY packet_count 
DESC
   . . . . . . .semicolon> LIMIT 10;
   +-----------------------------------------+---------------+
   |                 src_ip                  | packet_count  |
   +-----------------------------------------+---------------+
   | 150.249.255.161                         | 176           |
   | 150.249.255.24                          | 28            |
   | 131.38.3.15                             | 26            |
   | 111.248.196.128                         | 25            |
   | 202.13.230.242                          | 20            |
   | 163.28.217.199                          | 19            |
   | 27.18.36.151                            | 18            |
   | 2001:320f:c2ed:8693:1dff:f8f8:500:f1ed  | 17            |
   | 203.70.190.81                           | 16            |
   | 203.70.182.104                          | 13            |
   +-----------------------------------------+---------------+
   10 rows selected (0.944 seconds)
   
   
   select src_ip, dst_ip from dfs.test.`testv1.pcap`WHERE is_corrupt=1 LIMIT 10;
   +------------------+------------------+
   |      src_ip      |      dst_ip      |
   +------------------+------------------+
   | 118.233.244.60   | 150.249.255.161  |
   | 150.249.255.161  | 165.63.110.188   |
   | 150.249.255.161  | 165.63.110.188   |
   | 172.40.96.180    | 131.39.133.22    |
   | 150.249.255.161  | 165.63.110.188   |
   | 150.249.255.161  | 165.63.110.188   |
   | 150.249.255.161  | 165.63.110.188   |
   | 150.249.255.161  | 165.63.110.188   |
   | 150.249.162.60   | 180.32.119.25    |
   | 150.249.255.161  | 165.63.110.188   |
   +------------------+------------------+
   10 rows selected (1.031 seconds)
   
   
   0: jdbc:drill:zk=local> SELECT  src_port , dst_port , src_mac_address , 
dst_mac_address
   . . . . . . .semicolon> FROM dfs.test.`testv1.pcap`
   . . . . . . .semicolon> WHERE is_corrupt =1 LIMIT 10;
   +-----------+-----------+--------------------+--------------------+
   | src_port  | dst_port  |  src_mac_address   |  dst_mac_address   |
   +-----------+-----------+--------------------+--------------------+
   | 57058     | 443       | 00:0C:DB:1F:72:41  | 88:E0:F3:7A:66:F0  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 443       | 55972     | 00:0C:DB:1F:72:41  | CC:4E:24:1F:4E:00  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 4016      | 7699      | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   | 80        | 20706     | 00:0C:DB:1F:72:41  | 00:12:E2:C0:3F:09  |
   +-----------+-----------+--------------------+--------------------+
   10 rows selected (0.751 seconds)
   
   SELECT getCountryName(src_ip) AS country, COUNT(*) as packet_count FROM 
dfs.test.`testv1.pcap` WHERE is_corrupt=1  GROUP BY getCountryName(src_ip) 
ORDER BY packet_count DESC LIMIT 10;
   +----------------+---------------+
   |    country     | packet_count  |
   +----------------+---------------+
   | Japan          | 269           |
   | Taiwan         | 124           |
   | United States  | 105           |
   | Unknown        | 49            |
   | China          | 26            |
   | South Korea    | 8             |
   | Australia      | 4             |
   | Germany        | 3             |
   | Hong Kong      | 2             |
   | Italy          | 1             |
   +----------------+---------------+
   10 rows selected (1.519 seconds)
   
   SELECT is_corrupt, COUNT(*) as packet_count FROM dfs.test.`testv1.pcap` 
GROUP BY is_corrupt;
   +-------------+---------------+
   | is_corrupt  | packet_count  |
   +-------------+---------------+
   | 0           | 6408          |
   | 1           | 592           |
   +-------------+---------------+
   2 rows selected (0.931 seconds)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to