Hi,
I have many losts reported by flow-capture. There are few collectors
with load ~1.00-2.00, every of them has ~1-4 flow-capture processes
running on different port, capturing from different routers. Data is
stored to /dev/shm without compression (which minimises load/hdd
interrupts/operations made by flow-capture/...) and then other process
(flow-cat) with compression -z9 is moving data to hdd. that's made
*after* part of data is stored in memory device.
The problem is that many flows are lost. There was more, but after
change to shm-store without compression the number decreases.. but still
exists.
Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2037504870 received=2037504899
lost=29
Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2037504986 received=2037505044 lost=58
Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2039754081 received=2039754110 lost=29
Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2039754139 received=2039754168 lost=29
Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2039754284 received=2039754313 lost=29
Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2039754342 received=2039754400 lost=58
Oct 26 12:46:15 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046570502 received=2046570531 lost=29
Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046618236 received=2046618265 lost=29
Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046619222 received=2046619251 lost=29
Oct 26 12:46:47 .... flow-capture[22423]: ftpdu_seq_check(): src_ip=...9
dst_ip=...182 d_version=5 expecting=312924847 received=312924876 lost=29
Oct 26 12:46:50 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046932944 received=2046932973 lost=29
Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046967338 received=2046967367 lost=29
Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046968353 received=2046968382 lost=29
Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046968440 received=2046968469 lost=29
Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2046986594 received=2046986623 lost=29
Oct 26 12:47:24 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2047237589 received=2047237618 lost=29
Oct 26 12:47:30 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7
dst_ip=...182 d_version=5 expecting=2047368901 received=2047368959 lost=58
Flow-capture configuration (for each router different port)
flow-capture -w /dev/shm/netflow/*** 0/0/9990 -n287 -z0 -p
/var/run/flow-tool/flow-capture.pid -V 5
ifconfig, ip - no errors. Only netstat shows receive pkt err.
# ifconfig eth0 | grep errors
RX packets:1057280702 errors:0 dropped:0 overruns:0 frame:0
TX packets:1399440 errors:0 dropped:0 overruns:0 carrier:0
# ip -s -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
1673007985 1167862764 0 0 0 777
RX errors: length crc frame fifo missed
0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1111745916 1574367 0 0 0 0
TX errors: aborted fifo window heartbeat
0 0 0 0
# netstat -su
Udp:
1058173680 packets received
1518159 packets to unknown port received.
53242 packet receive errors
3907675 packets sent
RcvbufErrors: 53242
# T=300; A=`netstat -su | grep "packet receive errors" | awk '{print
$1'}`; sleep $T; B=`netstat -su | grep "packet receive errors" | awk
'{print $1'}`; echo $(($B-$A)) errors in $T secs;
29 errors in 300 secs
I've increased rcv buffers in OS, in flow-capture and on NIC:
# ethtool --show-ring eth0
Ring parameters for eth0:
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 256
# sysctl -n net.core.rmem_default net.core.rmem_max
32768000
32768000
# grep 'define.*RCV_BUFSIZE' /usr/src/flow-tools-0.68.4.3/lib/ftlib.h
#define FT_RCV_BUFSIZE 2048 /* enough to handle largest export */
#define FT_SO_RCV_BUFSIZE (32*1024*1024) /* UDP recv socket buffer
size */
Router with ip ...7 has cpu usage ~30-40%, few ports with ~1-2.5
Gbits/s. Every port on it has not more than 20% of link saturation.
In this example I use flow-tools-0.68-4.3, collector has BiQuad CPU with
8x2.66GHz, 16GB of RAM, with SMP kernel 2.6.24.x x86_64. On other I have
2.6.28.4-x kernel but it seems to not make a difference. OS is Debian.
Any idea?
--
|
pozdrawiam |
Jakub Słociński |
|
_______________________________________________
Flow-tools mailing list
[email protected]
http://mailman.splintered.net/mailman/listinfo/flow-tools