Jakub Słociński
Tue, 27 Oct 2009 05:53:46 -0700
Hi,I have many losts reported by flow-capture. There are few collectors with load ~1.00-2.00, every of them has ~1-4 flow-capture processes running on different port, capturing from different routers. Data is stored to /dev/shm without compression (which minimises load/hdd interrupts/operations made by flow-capture/...) and then other process (flow-cat) with compression -z9 is moving data to hdd. that's made *after* part of data is stored in memory device. The problem is that many flows are lost. There was more, but after change to shm-store without compression the number decreases.. but still exists.
Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2037504870 received=2037504899 lost=29 Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2037504986 received=2037505044 lost=58 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754081 received=2039754110 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754139 received=2039754168 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754284 received=2039754313 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754342 received=2039754400 lost=58 Oct 26 12:46:15 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046570502 received=2046570531 lost=29 Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046618236 received=2046618265 lost=29 Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046619222 received=2046619251 lost=29 Oct 26 12:46:47 .... flow-capture[22423]: ftpdu_seq_check(): src_ip=...9 dst_ip=...182 d_version=5 expecting=312924847 received=312924876 lost=29 Oct 26 12:46:50 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046932944 received=2046932973 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046967338 received=2046967367 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046968353 received=2046968382 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046968440 received=2046968469 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046986594 received=2046986623 lost=29 Oct 26 12:47:24 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2047237589 received=2047237618 lost=29 Oct 26 12:47:30 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2047368901 received=2047368959 lost=58
Flow-capture configuration (for each router different port)flow-capture -w /dev/shm/netflow/*** 0/0/9990 -n287 -z0 -p /var/run/flow-tool/flow-capture.pid -V 5
ifconfig, ip - no errors. Only netstat shows receive pkt err.
# ifconfig eth0 | grep errors
RX packets:1057280702 errors:0 dropped:0 overruns:0 frame:0
TX packets:1399440 errors:0 dropped:0 overruns:0 carrier:0
# ip -s -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ffRX: bytes packets errors dropped overrun mcast 1673007985 1167862764 0 0 0 777 RX errors: length crc frame fifo missed 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 1111745916 1574367 0 0 0 0 TX errors: aborted fifo window heartbeat 0 0 0 0
# netstat -su Udp: 1058173680 packets received 1518159 packets to unknown port received. 53242 packet receive errors 3907675 packets sent RcvbufErrors: 53242# T=300; A=`netstat -su | grep "packet receive errors" | awk '{print $1'}`; sleep $T; B=`netstat -su | grep "packet receive errors" | awk '{print $1'}`; echo $(($B-$A)) errors in $T secs;
29 errors in 300 secs I've increased rcv buffers in OS, in flow-capture and on NIC: # ethtool --show-ring eth0 Ring parameters for eth0: Current hardware settings: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 256 # sysctl -n net.core.rmem_default net.core.rmem_max 32768000 32768000 # grep 'define.*RCV_BUFSIZE' /usr/src/flow-tools-0.68.4.3/lib/ftlib.h #define FT_RCV_BUFSIZE 2048 /* enough to handle largest export */#define FT_SO_RCV_BUFSIZE (32*1024*1024) /* UDP recv socket buffer size */
Router with ip ...7 has cpu usage ~30-40%, few ports with ~1-2.5 Gbits/s. Every port on it has not more than 20% of link saturation.
In this example I use flow-tools-0.68-4.3, collector has BiQuad CPU with 8x2.66GHz, 16GB of RAM, with SMP kernel 2.6.24.x x86_64. On other I have 2.6.28.4-x kernel but it seems to not make a difference. OS is Debian.
Any idea?
--
|
pozdrawiam |
Jakub Słociński |
|
_______________________________________________
Flow-tools mailing list
flow-to...@splintered.net
http://mailman.splintered.net/mailman/listinfo/flow-tools