Hi,

I have a dedicated SunFire V210 collecting Netflow from 60 Cisco's on my
campus, running flow-tools 0.67.  Recently the flow-capture processes
started logging (very heavily) ftpdu_seq_check() errors, indicating that
the sequence numbers on the flow records are not getting processed in
the correct order.  Here's a sample of the logs (filtered to show
records from only one router).  This is the raw output, and directory
following is a stripped down, more readable version:
 
Jan 26 16:03:35 borg.nyu.edu flow-capture[9428]: [ID 148558 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=58 received=610 lost=552
Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 650602 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=3 received=1 lost=4294967293
Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 467302 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=610 received=6 lost=4294966691
Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 300642 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=958 received=33 lost=4294966370
Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 964148 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=291 received=51 lost=4294967055
Jan 26 16:03:37 borg.nyu.edu flow-capture[9428]: [ID 972929 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5
expecting=22 received=78 lost=56
Jan 26 16:03:37 borg.nyu.edu flow-capture[9428]: [ID 393162 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5
expecting=107 received=1 lost=4294967189
Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 874228 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=78 received=341 lost=263
Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 516023 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=6 received=1 lost=4294967290
Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 706140 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5
expecting=602 received=12 lost=4294966705
Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 165033 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5
expecting=44915 received=17 lost=4294922397
Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 678267 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5
expecting=24 received=365 lost=341
Jan 26 16:03:41 borg.nyu.edu flow-capture[9428]: [ID 467412 local6.info]
ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5
expecting=341 received=771 lost=430

Here's the same data with only the time/expecting/received/lost fields:

Jan 26 16:03:35 expecting=58 received=610 lost=552
Jan 26 16:03:36 expecting=3 received=1 lost=4294967293
Jan 26 16:03:36 expecting=610 received=6 lost=4294966691
Jan 26 16:03:36 expecting=958 received=33 lost=4294966370
Jan 26 16:03:36 expecting=291 received=51 lost=4294967055
Jan 26 16:03:37 expecting=22 received=78 lost=56
Jan 26 16:03:37 expecting=107 received=1 lost=4294967189
Jan 26 16:03:39 expecting=78 received=341 lost=263
Jan 26 16:03:39 expecting=6 received=1 lost=4294967290
Jan 26 16:03:39 expecting=602 received=12 lost=4294966705
Jan 26 16:03:39 expecting=44915 received=17 lost=4294922397
Jan 26 16:03:39 expecting=24 received=365 lost=341
Jan 26 16:03:41 expecting=341 received=771 lost=430

It seems that the records are being processed out of order.  For
example, the first row received seq# 610, which was not expected until 1
second later, 2 rows down.  The pattern repeats itself.  Also, the
collector is showing overflow errors in its "lost" arithmetic. 

The collector's resources are not overloaded.  Also, I'm pretty sure
I've ruled out network delays as an explanation.  First of all the
errors are happening too frequently.  Secondly, I'm even getting them
from routers that are directly linked to the collector via a single gigE
switch, which should have (practically) zero delay.  And last, it's
happening consistently with almost every router on my network.

I'm trying to narrow down the domain of possible explanations, and the
more I look at it the more it seems to be a configuration error or bug
on the flow-capture side.  Has anybody had this experience, or know how
to handle the problem?

Thanks,

Ari Leichtberg

_______________________________________________
Flow-tools mailing list
[EMAIL PROTECTED]
http://mailman.splintered.net/mailman/listinfo/flow-tools

Reply via email to