Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-18 Thread itriA30110
Hi Paolo and Scott,

  Your discussion pointed out an important issue - what is the proper design 
when it comes to high frequently IP-based queries?

  Inspired by this thread, my idea is to use postgresql and count on its 
cidr/inet function. (I am using mysql and store ipv4 as string(15))

  This way the IP address is readable and processing IP address is also 
efficient.  It seems worthy to try.  I will try this approach later. 

  Any comment is appreciated.  
-Original Message-
From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf 
Of Paolo Lucente
Sent: Thursday, September 18, 2014 8:44 AM
To: Scott Pettit - Vorco
Cc: pmacct-discussion@pmacct.net
Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

Hi Scott,

Great. Let me know how your testing goes. Should you hit any bugs or gaps, feel 
free to contact me privately to solve them. Then we can summarize on the list.

Cheers,
Paolo

On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote:
 In MySQL (any SQL actually), string searches are computationally very 
 expensive when compared with integers. If you’re dealing with a 
 relatively small number of rows then the difference isn’t a problem.
 
 It’s like asking “show me all the rows where the number is between 1 
 and 1000” versus “show me all the rows that look like something 
 between one and one thousand in words”.  The second statement is the 
 same question but requires a lot more processing because the database 
 has to think about every number as a word.  It might only take 
 fractions of a second longer for one row, but multiply that by 300 
 million rows (which is what I’m trying to analyse) and the difference 
 is huge. PostgreSQL handles this really well as it has support for 
 IP’s as a column type which presumably converts to integers in the back end.
 
 I will try pmacct with ipv4 on my test collector with sql_num_hosts 
 and if it works well then having the same for inet6_aton would be great.
 
 -Scott
 
 
 On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote:
 
 Hi Scott and Paolo,
 
   May I ask a question per your good experience?
 
   What is the performance difference between using string ipv4 and 
 integer ipv4?  And potential impact?
 
   I have a PC running sfacctd that collects 2 giga-switches.  And it 
 collects around 5M records into mysql per day.
 
   I am satisfied with the default string ipv4 for its convenience of 
 being readable.  And I would like to know the performance drawback.
 
 
  The content of this message and any attachments may be privileged, 
 confidential or sensitive. Any unauthorised used is prohibited. Views 
 expressed in this message are those of the individual sender, except where 
 stated otherwise with appropriate authority. All pricing provided is valid at 
 the time of writing only and due to factors such as the exchange rate, may 
 change without notice. Sales are made subject to our Terms  Conditions, 
 available on our website or on request.
 

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists



本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose 
it in any way and delete it if you are not the intended recipient.
___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-18 Thread Olivier Benghozi
Hi guys,

I just wanted to summarize that:

It looks like a fine way to address IPs in a DB seems to be the pgsql way. 
Fast, and you can easily select data using subnets without relying on regexp. 
While in mysql you either obtain a int value difficult to handle, or a string 
very slow to search (choose between plague and cholera).

For some reasons Scott is migrating from something that works to something 
that doesn't match the needs and causes problems.

So, the DB engine choice was obviously not decided according to the needs. So 
the reasons were wrong. Rollback.
Problem solved: there's no problem :)


regards,
Olivier

Le 18 sept. 2014 à 12:04, itria30...@itri.org.tw a écrit :

 Hi Paolo and Scott,
 
  Your discussion pointed out an important issue - what is the proper design 
 when it comes to high frequently IP-based queries?
 
  Inspired by this thread, my idea is to use postgresql and count on its 
 cidr/inet function. (I am using mysql and store ipv4 as string(15))
 
  This way the IP address is readable and processing IP address is also 
 efficient.  It seems worthy to try.  I will try this approach later. 
 
  Any comment is appreciated.  
 -Original Message-
 From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On 
 Behalf Of Paolo Lucente
 Sent: Thursday, September 18, 2014 8:44 AM
 To: Scott Pettit - Vorco
 Cc: pmacct-discussion@pmacct.net
 Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
 
 Hi Scott,
 
 Great. Let me know how your testing goes. Should you hit any bugs or gaps, 
 feel free to contact me privately to solve them. Then we can summarize on the 
 list.
 
 Cheers,
 Paolo
 
 On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote:
 In MySQL (any SQL actually), string searches are computationally very 
 expensive when compared with integers. If you’re dealing with a 
 relatively small number of rows then the difference isn’t a problem.
 
 It’s like asking “show me all the rows where the number is between 1 
 and 1000” versus “show me all the rows that look like something 
 between one and one thousand in words”.  The second statement is the 
 same question but requires a lot more processing because the database 
 has to think about every number as a word.  It might only take 
 fractions of a second longer for one row, but multiply that by 300 
 million rows (which is what I’m trying to analyse) and the difference 
 is huge. PostgreSQL handles this really well as it has support for 
 IP’s as a column type which presumably converts to integers in the back end.
 
 I will try pmacct with ipv4 on my test collector with sql_num_hosts 
 and if it works well then having the same for inet6_aton would be great.
 
 -Scott
 
 
 On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote:
 
 Hi Scott and Paolo,
 
 May I ask a question per your good experience?
 
 What is the performance difference between using string ipv4 and 
 integer ipv4?  And potential impact?
 
 I have a PC running sfacctd that collects 2 giga-switches.  And it 
 collects around 5M records into mysql per day.
 
 I am satisfied with the default string ipv4 for its convenience of 
 being readable.  And I would like to know the performance drawback.


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-17 Thread Paolo Lucente
Hi Scott,

Great. Let me know how your testing goes. Should you hit any bugs
or gaps, feel free to contact me privately to solve them. Then we
can summarize on the list.

Cheers,
Paolo

On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote:
 In MySQL (any SQL actually), string searches are computationally very
 expensive when compared with integers. If you’re dealing with a relatively
 small number of rows then the difference isn’t a problem.
 
 It’s like asking “show me all the rows where the number is between 1 and
 1000” versus “show me all the rows that look like something between one
 and one thousand in words”.  The second statement is the same question but
 requires a lot more processing because the database has to think about
 every number as a word.  It might only take fractions of a second longer
 for one row, but multiply that by 300 million rows (which is what I’m
 trying to analyse) and the difference is huge. PostgreSQL handles this
 really well as it has support for IP’s as a column type which presumably
 converts to integers in the back end.
 
 I will try pmacct with ipv4 on my test collector with sql_num_hosts and if
 it works well then having the same for inet6_aton would be great.
 
 -Scott
 
 
 On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote:
 
 Hi Scott and Paolo,
 
   May I ask a question per your good experience?
 
   What is the performance difference between using string ipv4 and
 integer ipv4?  And potential impact?
 
   I have a PC running sfacctd that collects 2 giga-switches.  And it
 collects around 5M records into mysql per day.
 
   I am satisfied with the default string ipv4 for its convenience of
 being readable.  And I would like to know the performance drawback.
 
 
  The content of this message and any attachments may be privileged, 
 confidential or sensitive. Any unauthorised used is prohibited. Views 
 expressed in this message are those of the individual sender, except where 
 stated otherwise with appropriate authority. All pricing provided is valid at 
 the time of writing only and due to factors such as the exchange rate, may 
 change without notice. Sales are made subject to our Terms  Conditions, 
 available on our website or on request.
 

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-16 Thread Paolo Lucente
Hi Scott,

You are looking for sql_num_hosts: true. This is tested to work
with IPv4 addresses. I'm reasonably sure it won't work with IPv6:
i seem to recall INET6_ATON was not available in 2011, the time of
the original implementation of the feature. Should not be a biggie
to introduce it.

Cheers,
Paolo

On Mon, Sep 15, 2014 at 11:54:25PM +, Scott Pettit - Vorco wrote:
 Hi,
 
 I've been running pmacct for a while with PostgreSQL and it works well with 
 lots of netflow data presumably due to inbuilt handling of CIDR.
 
 For various reasons we're migrating to a mySQL backend to integrate with our 
 billing/customer portal environment better and mySQL does not appear to deal 
 well with strings in complex reports.
 
 We generate around 10GB of data per hour so when I want to do something like 
 select all the flows over a week and then spit out the top 50 AS sources, 
 show the total data from each of the top AS sources, and then the top 50 
 destination IP's in our network that those AS's are sending data to, mySQL 
 chokes for an hour trying to work that out whereas PostgreSQL takes a few 
 minutes but doesn't struggle. I've played with indexes and got some 
 improvement but nothing significant.
 
 I was looking into using ATON/NTOA functions in mySQL to store IP addresses 
 as integers instead which I expect should dramatically increase performance 
 especially when dealing with CIDR as I can query based on a range of integers.
 
 As far as I can see pmacct doesn't seem to have an option to tell it to do 
 its INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such 
 an option or a patch that exists? If not is there a reason why this would be 
 a bad idea?
 
 -Scott
 
 The content of this message and any attachments may be privileged, 
 confidential or sensitive. Any unauthorised used is prohibited. Views 
 expressed in this message are those of the individual sender, except where 
 stated otherwise with appropriate authority. All pricing provided is valid at 
 the time of writing only and due to factors such as the exchange rate, may 
 change without notice. Sales are made subject to our Terms  Conditions, 
 available on our website or on request.
 

 ___
 pmacct-discussion mailing list
 http://www.pmacct.net/#mailinglists


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-16 Thread itriA30110
Hi Scott and Paolo, 

  May I ask a question per your good experience?

  What is the performance difference between using string ipv4 and integer 
ipv4?  And potential impact?

  I have a PC running sfacctd that collects 2 giga-switches.  And it collects 
around 5M records into mysql per day. 

  I am satisfied with the default string ipv4 for its convenience of being 
readable.  And I would like to know the performance drawback. 

  Thank you. :)

-Original Message-
From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf 
Of Paolo Lucente
Sent: Wednesday, September 17, 2014 7:29 AM
To: pmacct-discussion@pmacct.net
Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

Hi Scott,

You are looking for sql_num_hosts: true. This is tested to work with IPv4 
addresses. I'm reasonably sure it won't work with IPv6:
i seem to recall INET6_ATON was not available in 2011, the time of the original 
implementation of the feature. Should not be a biggie to introduce it.

Cheers,
Paolo

On Mon, Sep 15, 2014 at 11:54:25PM +, Scott Pettit - Vorco wrote:
 Hi,
 
 I've been running pmacct for a while with PostgreSQL and it works well with 
 lots of netflow data presumably due to inbuilt handling of CIDR.
 
 For various reasons we're migrating to a mySQL backend to integrate with our 
 billing/customer portal environment better and mySQL does not appear to deal 
 well with strings in complex reports.
 
 We generate around 10GB of data per hour so when I want to do something like 
 select all the flows over a week and then spit out the top 50 AS sources, 
 show the total data from each of the top AS sources, and then the top 50 
 destination IP's in our network that those AS's are sending data to, mySQL 
 chokes for an hour trying to work that out whereas PostgreSQL takes a few 
 minutes but doesn't struggle. I've played with indexes and got some 
 improvement but nothing significant.
 
 I was looking into using ATON/NTOA functions in mySQL to store IP addresses 
 as integers instead which I expect should dramatically increase performance 
 especially when dealing with CIDR as I can query based on a range of integers.
 
 As far as I can see pmacct doesn't seem to have an option to tell it to do 
 its INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such 
 an option or a patch that exists? If not is there a reason why this would be 
 a bad idea?
 
 -Scott
 
 The content of this message and any attachments may be privileged, 
 confidential or sensitive. Any unauthorised used is prohibited. Views 
 expressed in this message are those of the individual sender, except where 
 stated otherwise with appropriate authority. All pricing provided is valid at 
 the time of writing only and due to factors such as the exchange rate, may 
 change without notice. Sales are made subject to our Terms  Conditions, 
 available on our website or on request.
 

 ___
 pmacct-discussion mailing list
 http://www.pmacct.net/#mailinglists


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists



本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 
This email may contain confidential information. Please do not use or disclose 
it in any way and delete it if you are not the intended recipient.
___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-16 Thread Scott Pettit - Vorco
In MySQL (any SQL actually), string searches are computationally very
expensive when compared with integers. If you’re dealing with a relatively
small number of rows then the difference isn’t a problem.

It’s like asking “show me all the rows where the number is between 1 and
1000” versus “show me all the rows that look like something between one
and one thousand in words”.  The second statement is the same question but
requires a lot more processing because the database has to think about
every number as a word.  It might only take fractions of a second longer
for one row, but multiply that by 300 million rows (which is what I’m
trying to analyse) and the difference is huge. PostgreSQL handles this
really well as it has support for IP’s as a column type which presumably
converts to integers in the back end.

I will try pmacct with ipv4 on my test collector with sql_num_hosts and if
it works well then having the same for inet6_aton would be great.

-Scott


On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote:

Hi Scott and Paolo,

  May I ask a question per your good experience?

  What is the performance difference between using string ipv4 and
integer ipv4?  And potential impact?

  I have a PC running sfacctd that collects 2 giga-switches.  And it
collects around 5M records into mysql per day.

  I am satisfied with the default string ipv4 for its convenience of
being readable.  And I would like to know the performance drawback.


 The content of this message and any attachments may be privileged, 
confidential or sensitive. Any unauthorised used is prohibited. Views expressed 
in this message are those of the individual sender, except where stated 
otherwise with appropriate authority. All pricing provided is valid at the time 
of writing only and due to factors such as the exchange rate, may change 
without notice. Sales are made subject to our Terms  Conditions, available on 
our website or on request.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

[pmacct-discussion] mySQL ATON/NTOA and pmacct performance

2014-09-15 Thread Scott Pettit - Vorco
Hi,

I've been running pmacct for a while with PostgreSQL and it works well with 
lots of netflow data presumably due to inbuilt handling of CIDR.

For various reasons we're migrating to a mySQL backend to integrate with our 
billing/customer portal environment better and mySQL does not appear to deal 
well with strings in complex reports.

We generate around 10GB of data per hour so when I want to do something like 
select all the flows over a week and then spit out the top 50 AS sources, show 
the total data from each of the top AS sources, and then the top 50 destination 
IP's in our network that those AS's are sending data to, mySQL chokes for an 
hour trying to work that out whereas PostgreSQL takes a few minutes but doesn't 
struggle. I've played with indexes and got some improvement but nothing 
significant.

I was looking into using ATON/NTOA functions in mySQL to store IP addresses as 
integers instead which I expect should dramatically increase performance 
especially when dealing with CIDR as I can query based on a range of integers.

As far as I can see pmacct doesn't seem to have an option to tell it to do its 
INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such an 
option or a patch that exists? If not is there a reason why this would be a bad 
idea?

-Scott

The content of this message and any attachments may be privileged, confidential 
or sensitive. Any unauthorised used is prohibited. Views expressed in this 
message are those of the individual sender, except where stated otherwise with 
appropriate authority. All pricing provided is valid at the time of writing 
only and due to factors such as the exchange rate, may change without notice. 
Sales are made subject to our Terms  Conditions, available on our website or 
on request.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists