Thanks Mark.  So, the more immediate problem that I'm encountering has
nothing to do with my Linux cluster, but rather with flow-report running
out of memory.

Basically, when I run flow-report on ~1.7GB of level 6 compressed data
(one day's activity on a router) the process will enter into a terminal
"disk sleep" state and just endlessly spin until I kill it.   This
usually occurs when the process reaches around 96% of memory capacity.
Here a sample snapshot of some stats after one such process has crashed:

--> Version, kernel: 2.4.27, gcc: 3.3.4

--> Results of top: 

Mem:    901216k total,   895324k used,     5892k free,      256k buffers
Swap:  1999992k total,   887396k used,  1112596k free,     1400k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2930 netflow   19   0  884m 848m 827m D  2.6 96.5  34:39.24 flow-report

--> Contents of /proc/2930/status:

Name:   flow-report
State:  D (disk sleep)
Tgid:   2930
Pid:    2930
PPid:   2915
VmSize:   907724 kB
VmLck:         0 kB
VmRSS:    866036 kB
VmData:   905668 kB

--> Results of strace -p 2390

 read(0, "[EMAIL PROTECTED]"...,
32768) = 32708
read(0, "\336\232\201A\360k\3111T\33\262TE\261L\300\361|\24E\320"...,
32768) = 32708
read(0, "\336\232\201A\302\246C2\\\33\262TE\261L\300\202\3\321\310"...,
32768) = 32708

Clearly the process has run out of physical memory, but it still has
plenty of swap space that's not being used.  Also, I'm wondering about
its behavior once the process reaches this point.  It doesn't blows up
with a malloc error or some other memory related issue.  It just keeps
making read() system calls without getting anywhere. 

Is this behavior expected when a flow-report process gets too big?
Also, I was running flow-reports on even larger files on a Solaris box
with twice as much memory.  On occasion a process would terminate with
no memory, but it never cycled like this, it seems like a linux thing.
Any help would be greatly appreciated.

Thanks,
Ari


 
> -----Original Message-----
> From: Mark Fullmer [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 02, 2004 8:13 PM
> To: Ari Leichtberg
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Flow-tools] Flow-tools on linux cluster (Mosix)
> 
> 
> On Nov 2, 2004, at 3:21 PM, Ari Leichtberg wrote:
> 
> >
> > On that note, does anybody know about the inner workings of
> > flow-report?
> > My general understanding is that it loads up a huge hashtable (or
other
> > data structure) in memory and then basically dumps out quick stats.
> > Not
> > very cpu intensive.  Is that accurate?
> 
> If there are less than 64K buckets in a key its a direct lookup
> otherwise a hash.  So an IP protocol or IP port report would not use a
> hash, an IP address report would.  If the pps and bps calculations are
> not in the report the floating point calculations are skipped which
can
> really impact CPU cycles.
> 
> Flow-report can run many reports on one pass of the data depending on
> the memory available.  This usually translates to a big speed gain
when
> running many reports vs flow-stat due to the reduced disk I/O.
> 
> --
> mark


_______________________________________________
Flow-tools mailing list
[EMAIL PROTECTED]
http://mailman.splintered.net/mailman/listinfo/flow-tools

Reply via email to