Hi,
I have many losts reported by flow-capture. There are few collectors with load ~1.00-2.00, every of them has ~1-4 flow-capture processes running on different port, capturing from different routers. Data is stored to /dev/shm without compression (which minimises load/hdd interrupts/operations made by flow-capture/...) and then other process (flow-cat) with compression -z9 is moving data to hdd. that's made *after* part of data is stored in memory device. The problem is that many flows are lost. There was more, but after change to shm-store without compression the number decreases.. but still exists.

Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2037504870 received=2037504899 lost=29 Oct 26 12:30:35 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2037504986 received=2037505044 lost=58 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754081 received=2039754110 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754139 received=2039754168 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754284 received=2039754313 lost=29 Oct 26 12:34:45 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2039754342 received=2039754400 lost=58 Oct 26 12:46:15 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046570502 received=2046570531 lost=29 Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046618236 received=2046618265 lost=29 Oct 26 12:46:19 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046619222 received=2046619251 lost=29 Oct 26 12:46:47 .... flow-capture[22423]: ftpdu_seq_check(): src_ip=...9 dst_ip=...182 d_version=5 expecting=312924847 received=312924876 lost=29 Oct 26 12:46:50 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046932944 received=2046932973 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046967338 received=2046967367 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046968353 received=2046968382 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046968440 received=2046968469 lost=29 Oct 26 12:46:57 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2046986594 received=2046986623 lost=29 Oct 26 12:47:24 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2047237589 received=2047237618 lost=29 Oct 26 12:47:30 .... flow-capture[22401]: ftpdu_seq_check(): src_ip=...7 dst_ip=...182 d_version=5 expecting=2047368901 received=2047368959 lost=58


Flow-capture configuration (for each router different port)
flow-capture -w /dev/shm/netflow/*** 0/0/9990 -n287 -z0 -p /var/run/flow-tool/flow-capture.pid -V 5

ifconfig, ip - no errors. Only netstat shows receive pkt err.
# ifconfig eth0 | grep errors
         RX packets:1057280702 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1399440 errors:0 dropped:0 overruns:0 carrier:0

# ip -s -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast 1673007985 1167862764 0 0 0 777 RX errors: length crc frame fifo missed 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 1111745916 1574367 0 0 0 0 TX errors: aborted fifo window heartbeat 0 0 0 0
# netstat -su
Udp:
   1058173680 packets received
   1518159 packets to unknown port received.
   53242 packet receive errors
   3907675 packets sent
   RcvbufErrors: 53242

# T=300; A=`netstat -su | grep "packet receive errors" | awk '{print $1'}`; sleep $T; B=`netstat -su | grep "packet receive errors" | awk '{print $1'}`; echo $(($B-$A)) errors in $T secs;
29 errors in 300 secs

I've increased rcv buffers in OS, in flow-capture and on NIC:

# ethtool --show-ring eth0
Ring parameters for eth0:
Current hardware settings:
RX:        4096
RX Mini:    0
RX Jumbo:    0
TX:        256

# sysctl -n net.core.rmem_default net.core.rmem_max
32768000
32768000

# grep 'define.*RCV_BUFSIZE' /usr/src/flow-tools-0.68.4.3/lib/ftlib.h
#define FT_RCV_BUFSIZE         2048  /* enough to handle largest export */
#define FT_SO_RCV_BUFSIZE (32*1024*1024) /* UDP recv socket buffer size */

Router with ip ...7 has cpu usage ~30-40%, few ports with ~1-2.5 Gbits/s. Every port on it has not more than 20% of link saturation.

In this example I use flow-tools-0.68-4.3, collector has BiQuad CPU with 8x2.66GHz, 16GB of RAM, with SMP kernel 2.6.24.x x86_64. On other I have 2.6.28.4-x kernel but it seems to not make a difference. OS is Debian.

Any idea?

--
                   |
pozdrawiam          |
Jakub Słociński     |
                   |


_______________________________________________
Flow-tools mailing list
[email protected]
http://mailman.splintered.net/mailman/listinfo/flow-tools

Reply via email to