I have to add that I think linux's buffering system keeps both nfcapd files in the RAM (cached) when they are called by my plugin, because the files occupy at most 200MB each (and I have 7GB of RAM out of which 192680k are used for buffers and 5733148k for cache). Anyway, I haven't checked if it is actually true! (but I hope it is)...
However, I'll try the solution with different profiles to see if I can gain something in performance. Peter Haag wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > A few remarks: > > - --On February 9, 2007 13:31:48 +0200 Adrian Popa <[EMAIL PROTECTED]> wrote: > > | Peter, thank you for your prompt reply. > | > | Your information will help me make some tests and then decide if what I > want is manageable... > | > | If I undestood correctly, nfcapd collects data for 5 minutes, and then > passes it to nfsen (and > | nfsen distributes it to different profiles/channels). This means that the > filters are applied > | 'offline', and applying the filters to all profiles will take roughly the > same amount of time it > | takes now with my plug-in. > > As I understand, your plugin runs 200 times nfdump, which takes considerable > more time than a > single run of nfprofile, due to different IO characteristics, unless you can > manage all the data > into the FS cache. > > | > | I had hoped that nfcapd was able to apply the filters when collecting data > (it had to do this in > | real time) for me to 'win' computational time. This way, after 5 minutes of > data collecting, I > | would have updated graphs for 200 profiles without additional computation. > What you say is that > | after 5 minutes, data is filtered into the existing profiles and this > filtering and writing to > | disk takes a while. For me, it means that the time may exceed the 5 minute > deadline. The > | additional problem with this is that the plugins are run after the profiles > are populated (or so > | I think). This means that my floodsearch plugin may run 4 minutes after the > live profile has its > | data - which is not what I want. > | > | I believe that the 'features' I want imply significantly modifying nfcapd - > which may not be > | possible, because in your design, nfcapd must be fast enough not to drop > incoming packets (which, > | after all is more important!)... > > That's exactly the reason, why I have no filters implemented in the > collector. In the end, you need > roughly the same amount of time independent of where in the chain you do it. > If you can not manage > to process the data within 5min, you won't most likely be able to manage if > the data were filtered > earlier. Of course this depends a little bit on how efficient things can be > implemented. > I already thought of an instant processing of flows, when hitting the > collector, but this is on the > end of the todo list. > > - Peter > | > | I fear that the only sustainable solution would be to separate the work > load on other collectors. > | > | To add some numbers to this, here's what I have now: > | > | System: IBM, 2 x Intel(R) Xeon(TM) CPU 3.40GHz, 7GB RAM, RAID 5 array with > 6 disks (72GB each) - > | currently nfsen uses 200GB, Red Hat Enterprise Linux 4. > | > | Netflow: 2 netflow v9 sources, exporting about 2.5 million flows each every > 5 minutes (packet > | input rate in the collector is about 7Mbps). > | Currently I have only 3 profiles, and updating them takes about 2 seconds. > | > | My plugin that generates traffic statistics based on these exports requires > about 200 runs of the > | nfdump command, and takes about 2.5-3 minutes to run. > | > | In the near future (1-2 months) I will have to add at least 4 more sources, > 2 of which have an > | expected data rate of 2Gbps. I know that my collector can handle the load, > but I'm not confident > | that my plugin will be able to finish in time. This is why I had all these > questions. > | > | For now I will try to add 20 profiles, to see the response time, but, as I > said before, I don't > | believe I will have a big speed increase. > | > | Thank you again, and if I misunderstood something, please correct me > | > | Adrian Popa > | > | PS. I don't want yet a snapshot with shadow channels - I will use it when > you are ready to > | release it for the general public. > | > | Peter Haag wrote: > | > -----BEGIN PGP SIGNED MESSAGE----- > | > Hash: SHA1 > | > > | > Adrian, > | > > | > To add some numbers to my previous mail: > | > > | > My nfsen-current developer installation: > | > > | > System: HP ProLiant DL360 G3 3GHz > | > 1 GB RAM > | > Internal Compaq Smart Array 5i 2x70GB disks mirrored > | > OS OpenBSD 4.0 > | > > | > 2 netflow sources configured, each about 25 - 30MB flows each 5min. > | > 30 channels to update ( about 12 shadow channels ) > | > time for profiling: 10s > | > > | > Some other plugins - such as PortTracker need much more time. > | > So there is some room for more channels ... > | > > | > - Peter > | > > | > - --On February 9, 2007 9:18:50 +0200 Adrian Popa <[EMAIL PROTECTED]> > wrote: > | > > | > | Peter, I have a question about performance. > | > | > | > | If I were to give up on my plugin and instead create about 200 profiles > | > | , each searching for a different network prefix and input/output > | > | interface in the flows, would nfsen be able to handle this kind of data? > | > | (actually I have 20 prefixes on 6 routers with 1 to 4 interfaces each, > | > | and I'd like to plot upstream and downstream traffic - so I think there > | > | will be more than 200 profiles...). > | > | For each profile, I could set an expire time of 10 minutes (because I > | > | don't need to save the flows - just to get the graphs). > | > | > | > | Or, as you suggested, I could use channels in the same profile, but I > | > | have no idea how I could create or manage them... (perhaps you have some > | > | tips where I could find some documentation about that). > | > | > | > | My main question is: do you think nfsen can handle 200 profiles? I have > | > | only a production machine, and I'm not eager to experiment on it! :) > | > | > | > | Thank you! > | > | > | > | Peter Haag wrote: > | > | > -----BEGIN PGP SIGNED MESSAGE----- > | > | > Hash: SHA1 > | > | > > | > | > Hi Adrian, > | > | > > | > | > - --On February 1, 2007 15:28:32 +0200 Adrian Popa <[EMAIL > PROTECTED]> wrote: > | > | > > | > | > | Hello, > | > | > | > | > | > | I have a question about the performance of nfdump, but first, let me > | > | > | explain what I'm trying to do: > | > | > | I have a plugin that searches the collected flows for specific > network > | > | > | prefixes (or AS-es) on each exporting router, on specific > intefaces. The > | > | > | information is then fed into custom rrd files and plotted as png > images. > | > | > | Searching is done by using a top 1 record/bytes and filtering by > 'inif x > | > | > | and net 1.2.3.0/24'. Here's an example: > | > | > | > | > | > | $nfdump -r /data/nfsen/profiles/live/$border/nfcapd.$timeslot -n 1 > -s > | > | > | record/bytes -o "fmt:%ts %td %pr %sap -> %dap %pkt %byt %bps %in > %out > | > | > | %sas %das %fl" '$ifType $if and net $prefix' > | > | > | > | > | > | I have to search for input traffic on a specific interface for a > | > | > | specific network prefix and also for output traffic for the same > thing. > | > | > | > | > | > | I achieved to do this, and it works well, but execution time for 2 > | > | > | borders (with 3-4 interfaces each), 20 prefixes and 50 AS-es for a > peak > | > | > | traffic of about 2Gbps takes about 3,5 minutes. > | > | > > | > | > If I understand you right, you are going to call this nfdump command > for > | > | > each of the prefixes, which results in a lot of sequential nfdump > commands. > | > | > > | > | > | > | > | > | In the future I will want to monitor other routers, on the same > | > | > | principle. As far as I see, I can do that, but either I gather less > | > | > | data, or I use a different machine for collecting. > | > | > | > | > | > | A colleague of mine proposed that I split my script (which is 100% > | > | > | sequential) into several threads that run at the same time. Each > thread > | > | > | would call nfdump and update it's particular rrd. > | > | > | > | > | > | My question to you is this: Assuming that I start the new processes > like > | > | > | threads (or more likely like forked processes), would I get a speed > | > | > | increase? I'd like to say that this script keeps the processor > usage at > | > | > | about 60-80%. > | > | > > | > | > The CPU usage is only half of the story for your plugin. Almost every > time > | > | > I/O is much more a problem. For each nfdump command you read a lot of > data. > | > | > If this amount of data does not fit into the file system cache of > your OS, > | > | > the performance rapidly drops. So I'd recommend you to analyse how > your > | > | > system behaves in such a plugin cycle. Have a look at the IO using > iostat > | > | > check the service time of your disks. If you still have room creating > | > | > more threads can result in better performance. If your IO system is > at its > | > | > limit, you will not gain anything, and CPU will stay at 60-80% as > your system > | > | > has lots of IO wait. Overcoming this, you would need more RAM to > increase the > | > | > available memory for the IO file system cache. A high service time in > IO stat > | > | > means slow disks - so you'll need the right balance of disks and > memory. > | > | > > | > | > Furthermore you can try to optimise IO by limiting reading data once > and doing > | > | > parallel processing - the way which nfprofile does profiling all your > channels. > | > | > It reads the data once only and applies all filters to the same data. > | > | > > | > | > Coming back to your plugin - you may try to optimise IO by creating a > profile > | > | > with a channel per prefix and creating adequate filters per channel. > Your > | > | > plugin then needs to create the stat Top 1 per channel which may > result in reading > | > | > less data over all - but this is just a guess. > | > | > > | > | > So - it's a bit of all. > | > | > Hope this helps anyway. > | > | > > | > | > - Peter > | > | > > | > | > | > | > | > | I don't know if the forked processes would load the same input file > into > | > | > | memory again and again, or if they would share the same file > (lowering > | > | > | memory consumption)? > | > | > | > | > | > | What are your recomandations? > | > | > | > | > | > | Thank you for your time, > | > | > | > | > | > | -- > | > | > | Adrian Popa > | > | > | > | > | > | > | > | > | > | > | > | > ------------------------------------------------------------------------- > | > | > | Using Tomcat but need to do more? Need to support web services, > security? > | > | > | Get stuff done quickly with pre-integrated technology to make your > job easier. > | > | > | Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > | > | > | > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > | > | > | _______________________________________________ > | > | > | Nfsen-discuss mailing list > | > | > | [email protected] > | > | > | https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > | > | > > | > | > > | > | > > | > | > - -- > | > | > _______ SWITCH - The Swiss Education and Research Network ______ > | > | > Peter Haag, Security Engineer, Member of SWITCH CERT > | > | > PGP fingerprint: D9 31 D5 83 03 95 68 BA FB 84 CA 94 AB FC 5D D7 > | > | > SWITCH, Limmatquai 138, CH-8001 Zurich, Switzerland > | > | > E-mail: [EMAIL PROTECTED] Web: http://www.switch.ch/ > | > | > -----BEGIN PGP SIGNATURE----- > | > | > Version: GnuPG v1.4.3 (Darwin) > | > | > > | > | > iQCVAwUBRcMAaP5AbZRALNr/AQLXEwP/bymvl/R3I5MqF8qXSq82QXwDng9VPcyH > | > | > 56KfUdgDFYpVSOM/Jjn08t8LPaGCA/2DQFxzjzXc+g/YngLfOFZFxkjZEDfRo3AS > | > | > 53T8cZeTHIx8gy4Xn1y5VqerK0+Q4BNB+I+1+YYo/g8wVfE+pBMNNKh1m+krIwLO > | > | > isZnG514jMA= > | > | > =E3HQ > | > | > -----END PGP SIGNATURE----- > | > | > > | > | > > | > | > > | > | > | > | > | > | > ------------------------------------------------------------------------- > | > | Using Tomcat but need to do more? Need to support web services, > security? > | > | Get stuff done quickly with pre-integrated technology to make your job > easier. > | > | Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > | > | http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > | > | _______________________________________________ > | > | Nfsen-discuss mailing list > | > | [email protected] > | > | https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > | > > | > > | > > | > - -- > | > _______ SWITCH - The Swiss Education and Research Network ______ > | > Peter Haag, Security Engineer, Member of SWITCH CERT > | > PGP fingerprint: D9 31 D5 83 03 95 68 BA FB 84 CA 94 AB FC 5D D7 > | > SWITCH, Limmatquai 138, CH-8001 Zurich, Switzerland > | > E-mail: [EMAIL PROTECTED] Web: http://www.switch.ch/ > | > -----BEGIN PGP SIGNATURE----- > | > Version: GnuPG v1.4.3 (Darwin) > | > > | > iQCVAwUBRcxHZ/5AbZRALNr/AQKWTQP/XBTsBhVVbh4sJoOnrQlqk1kPlOHx9Elu > | > J1qEFANATT43zpsa4hwxTitph+shRRFdzOcv5bvuFzrcvJA2extaWKRFll2BCNC5 > | > RiFNlbER7iuI+PzFt38p0qQf+LSN3Cm0srGaY8IhAXytrEwbgTt+Rjo08Vg6deAN > | > I7ssmcJzavA= > | > =G5ny > | > -----END PGP SIGNATURE----- > | > > | > > | > > | > | > > > > - -- > _______ SWITCH - The Swiss Education and Research Network ______ > Peter Haag, Security Engineer, Member of SWITCH CERT > PGP fingerprint: D9 31 D5 83 03 95 68 BA FB 84 CA 94 AB FC 5D D7 > SWITCH, Limmatquai 138, CH-8001 Zurich, Switzerland > E-mail: [EMAIL PROTECTED] Web: http://www.switch.ch/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.3 (Darwin) > > iQCVAwUBRcxrFf5AbZRALNr/AQK9cwP7BCX/ouOB8DEO7pK8497sOmhO9vOpN6T2 > Wkmek4V7ChARtAqnrxzD0L0ppeIdxQPEuvpFg8PEGq3vo+Gsv/mxosXhWHhM4RIX > S/4EnmnaLr0PKVsSquy1OA0gTWySXjQs3PkeHXgldA0b6W77mQPcAAZO/DnsrfYk > MEkthcB2JjM= > =GB7g > -----END PGP SIGNATURE----- > > > -- Adrian Popa Junior Network Engineer Romtelecom S.A. Divizia Centrul National de Operare Retea Departament Transport IP & Metro Compartiment IP Core ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nfsen-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
