cgivre commented on issue #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#issuecomment-462151344 Actually, some good news here… I ran some test queries on the corrupted file and it seemed to work pretty well. I didn’t get any exceptions! ``` jdbc:drill:zk=local> select src_ip, COUNT(*) as packet_count from dfs.test.`testv1.pcap`WHERE is_corrupt=1 GROUP BY src_ip ORDER BY packet_count DESC . . . . . . .semicolon> LIMIT 10; +-----------------------------------------+---------------+ | src_ip | packet_count | +-----------------------------------------+---------------+ | 150.249.255.161 | 176 | | 150.249.255.24 | 28 | | 131.38.3.15 | 26 | | 111.248.196.128 | 25 | | 202.13.230.242 | 20 | | 163.28.217.199 | 19 | | 27.18.36.151 | 18 | | 2001:320f:c2ed:8693:1dff:f8f8:500:f1ed | 17 | | 203.70.190.81 | 16 | | 203.70.182.104 | 13 | +-----------------------------------------+---------------+ 10 rows selected (0.944 seconds) select src_ip, dst_ip from dfs.test.`testv1.pcap`WHERE is_corrupt=1 LIMIT 10; +------------------+------------------+ | src_ip | dst_ip | +------------------+------------------+ | 118.233.244.60 | 150.249.255.161 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 172.40.96.180 | 131.39.133.22 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.162.60 | 180.32.119.25 | | 150.249.255.161 | 165.63.110.188 | +------------------+------------------+ 10 rows selected (1.031 seconds) 0: jdbc:drill:zk=local> SELECT src_port , dst_port , src_mac_address , dst_mac_address . . . . . . .semicolon> FROM dfs.test.`testv1.pcap` . . . . . . .semicolon> WHERE is_corrupt =1 LIMIT 10; +-----------+-----------+--------------------+--------------------+ | src_port | dst_port | src_mac_address | dst_mac_address | +-----------+-----------+--------------------+--------------------+ | 57058 | 443 | 00:0C:DB:1F:72:41 | 88:E0:F3:7A:66:F0 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 443 | 55972 | 00:0C:DB:1F:72:41 | CC:4E:24:1F:4E:00 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 4016 | 7699 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | +-----------+-----------+--------------------+--------------------+ 10 rows selected (0.751 seconds) SELECT getCountryName(src_ip) AS country, COUNT(*) as packet_count FROM dfs.test.`testv1.pcap` WHERE is_corrupt=1 GROUP BY getCountryName(src_ip) ORDER BY packet_count DESC LIMIT 10; +----------------+---------------+ | country | packet_count | +----------------+---------------+ | Japan | 269 | | Taiwan | 124 | | United States | 105 | | Unknown | 49 | | China | 26 | | South Korea | 8 | | Australia | 4 | | Germany | 3 | | Hong Kong | 2 | | Italy | 1 | +----------------+---------------+ 10 rows selected (1.519 seconds) SELECT is_corrupt, COUNT(*) as packet_count FROM dfs.test.`testv1.pcap` GROUP BY is_corrupt; +-------------+---------------+ | is_corrupt | packet_count | +-------------+---------------+ | 0 | 6408 | | 1 | 592 | +-------------+---------------+ 2 rows selected (0.931 seconds) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
