Hello,

On 10/2/07, Werner Schram <[EMAIL PROTECTED]> wrote:
> Hi,
>
> You always need someone that disagrees in discussions like these, so
> here I go :)
>
>
> Adrian Popa wrote:
> > Would any database manage the huge volume of data (I have about 200GB
> >  of data from about 2 days of collecting)?
>
> Databases are designed to handle huge amounts of data, so the answer to
> your question would definitely be yes (companies like IBM build
> sql-based data warehouses that handle terabytes of data).
>

Thank you for that information - but I was judging the performance of
the database engine based on what I've seen in real life

> > I also have mysql setups that take about a minute to search through 3
> > million records that take about 1GB (on similar hardware setups).
> >
>
> In my opinion, this doesn't really say anything. Does this database
> contain netflow information? If not, how is it comparable? Is the
> database indexed correctly? Is your query optimized for these indexes?
> Are you doing full text searches? Is mysql the best tool for the job?
>

The table I was talking about holds syslog information, and has a date
field and a text field used to store the actual message. Searches
through it are very slow and painful. If mysql isn't the best tool for
the job, I don't know what is...

> > This is why I think a binary format + a fast application can go
> > through the data much faster than a conventional application.
> >
>
> I partly agree, in that I think that a binary format *can* be faster,
> but I seriously doubt that the current nfcapd format *is* faster, as it
> doesn't include any indexes or other methods that improve the speed of
> random searches on fields other than then endtime of the flow. For example:
> If I would like to see all hosts that contacted a certain subnet during
> a 1 hour period, nfdump would traverse all the flows in this period (in
> our case that would be about 30 million) and compare the destination of
> every flow to the subnet. A sql database with a b-tree index on the
> "destination ip" field would simply use this b-tree to filter out the
> correct records, thus preventing millions of comparisons and file
> operations.
>
> The flowd collector (http://www.mindrot.org/projects/flowd/) includes
> scripts that can be used to create a mysql backed collector. I might
> setup a system next week to compare it's performance to nfdump as I am
> quite curious to see how it actually compares (and to see if I am not
> making idle claims :)
>

Looking forward to your measurements.

> Werner
>
> > Adrian Popa
> >
> > On 10/2/07, Tristan RHODES <[EMAIL PROTECTED]> wrote:
> >> Will using a database backend to store flowdata help improve query
> >> times?  Has anyone experimented with this?
> >>
> >> Tristan Rhodes Weber State University
> >>
> >>
> >> -------------------------------------------------------------------------
> >>  This SF.net email is sponsored by: Microsoft Defy all challenges.
> >> Microsoft(R) Visual Studio 2005.
> >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> >> _______________________________________________ Nfsen-discuss
> >> mailing list [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
> >>
> >
> > -------------------------------------------------------------------------
> >  This SF.net email is sponsored by: Microsoft Defy all challenges.
> > Microsoft(R) Visual Studio 2005.
> > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> > _______________________________________________ Nfsen-discuss mailing
> > list [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nfsen-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Reply via email to