-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


- --On October 2, 2007 16:56:47 +0200 Werner Schram <[EMAIL PROTECTED]> wrote:

| Hi,
|
| You always need someone that disagrees in discussions like these, so
| here I go :)
|
|
| Adrian Popa wrote:
| > Would any database manage the huge volume of data (I have about 200GB
| >  of data from about 2 days of collecting)?
|
| Databases are designed to handle huge amounts of data, so the answer to
| your question would definitely be yes (companies like IBM build
| sql-based data warehouses that handle terabytes of data).

I agree, that DBs are designed to do that. The question is only, how much 
effort you 
want to spend to optimized queries and indices, as this heavily depends on the 
usage 
of the data. The larger the amount of data becomes, the more complex, these 
tasks 
become. And this may result in larger costs, not every one is willing to pay. 
IBM 
sells theirs products for good money - nfdump is for free.

|
| > I also have mysql setups that take about a minute to search through 3
| > million records that take about 1GB (on similar hardware setups).
| >
|
| In my opinion, this doesn't really say anything. Does this database
| contain netflow information? If not, how is it comparable? Is the
| database indexed correctly? Is your query optimized for these indexes?
| Are you doing full text searches? Is mysql the best tool for the job?

Werner - take Adrian's statement as it is. It says, that the setup he has takes 
this 
amount of time for the job. It only supports the statement, that DBs need to be 
highly optimized. I've talked to many people doing netflow storage on DBs, and 
it's 
not a piece of cake. Most of them did not have several TB of data, as we have, 
for 
example.
Having netflow in a DB is tricky, but nonetheless doable.

|
| > This is why I think a binary format + a fast application can go
| > through the data much faster than a conventional application.
| >
|
| I partly agree, in that I think that a binary format *can* be faster,
| but I seriously doubt that the current nfcapd format *is* faster, as it
| doesn't include any indexes or other methods that improve the speed of
| random searches on fields other than then endtime of the flow. For example:
| If I would like to see all hosts that contacted a certain subnet during
| a 1 hour period, nfdump would traverse all the flows in this period (in
| our case that would be about 30 million) and compare the destination of
| every flow to the subnet. A sql database with a b-tree index on the
| "destination ip" field would simply use this b-tree to filter out the
| correct records, thus preventing millions of comparisons and file
| operations.

In your terms - I partly agree. Check the design goal of an application. Almost 
all 
applications are (ab)used for tasks, they were never meant for. nfdump was not 
designed to answer the question above, and by nature is slower, than a b-tree 
search 
in an SQL DB. But nfdump is *ways* faster in building the top 10 talkers or 
search 
the top connection transferring most traffic in a 5min slot, which it was - by 
others 
- - designed for.

So take the right application and the right storage format, what's best suited 
for. 
Backend-DBs have doubtless their advantage but also their disadvantage. There 
is a 
project - outside this list and outside SWITCH, which implements a backend for 
nfdump 
based on a postgresql DB. I'm sure, it can be distributed as contrib code to 
nfdump, 
for those having a need to do so.

As for nfdump and indices: nfdump will get an IP index ( b-tree or splay ) for 
more 
flexible search options, but does not go that far as using a backend DB. It 
will be a 
mixture of both, to optimize the tasks in mind. It will also include data 
compression, reducing file size down to 50% without noticeable speed penalty. 
The 
impatient may contact me offlist for demo code.

Make the long story short. Think what tasks you have to do and optimize these 
tasks. 
Either binary- or DB-wise. Do not abuse applications, they were never meant for.


    - Peter

|
| The flowd collector (http://www.mindrot.org/projects/flowd/) includes
| scripts that can be used to create a mysql backed collector. I might
| setup a system next week to compare it's performance to nfdump as I am
| quite curious to see how it actually compares (and to see if I am not
| making idle claims :)
|
| Werner
|
| > Adrian Popa
| >
| > On 10/2/07, Tristan RHODES <[EMAIL PROTECTED]> wrote:
| >> Will using a database backend to store flowdata help improve query
| >> times?  Has anyone experimented with this?
| >>
| >> Tristan Rhodes Weber State University
| >>
| >>
| >> -------------------------------------------------------------------------
| >>  This SF.net email is sponsored by: Microsoft Defy all challenges.
| >> Microsoft(R) Visual Studio 2005.
| >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
| >> _______________________________________________ Nfsen-discuss
| >> mailing list [email protected]
| >> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
| >>
| >
| > -------------------------------------------------------------------------
| >  This SF.net email is sponsored by: Microsoft Defy all challenges.
| > Microsoft(R) Visual Studio 2005.
| > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
| > _______________________________________________ Nfsen-discuss mailing
| > list [email protected]
| > https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
|
|
| -------------------------------------------------------------------------
| This SF.net email is sponsored by: Microsoft
| Defy all challenges. Microsoft(R) Visual Studio 2005.
| http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
| _______________________________________________
| Nfsen-discuss mailing list
| [email protected]
| https://lists.sourceforge.net/lists/listinfo/nfsen-discuss



- --
_______ SWITCH - The Swiss Education and Research Network ______
Peter Haag,  Security Engineer,  Member of SWITCH CERT
PGP fingerprint: D9 31 D5 83 03 95 68 BA  FB 84 CA 94 AB FC 5D D7
SWITCH, Werdstrasse 2, P.O. Box,  CH-8021   Zurich, Switzerland
E-mail: [EMAIL PROTECTED] Web: http://www.switch.ch/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iQCVAwUBRwM8n/5AbZRALNr/AQK9egP8DkPcd8OQeFxmFMb9IoRkcGhqxVAISj5m
quTZb5OF2Xzg8gnEymcUYUubIW99xqcR4xDvyC3G6oUujMju69Dp5PVLe2lCXpeO
03rSu9XGDzAXHHiDf9u8wLn/syGy/YvUgeOMbeqnlYMAvmMMLvLqJTzd+f+ccuad
WOgsOvq6BAo=
=9Nr2
-----END PGP SIGNATURE-----


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nfsen-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Reply via email to