Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi Paolo and Scott, Your discussion pointed out an important issue - what is the proper design when it comes to high frequently IP-based queries? Inspired by this thread, my idea is to use postgresql and count on its cidr/inet function. (I am using mysql and store ipv4 as string(15)) This way the IP address is readable and processing IP address is also efficient. It seems worthy to try. I will try this approach later. Any comment is appreciated. -Original Message- From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf Of Paolo Lucente Sent: Thursday, September 18, 2014 8:44 AM To: Scott Pettit - Vorco Cc: pmacct-discussion@pmacct.net Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance Hi Scott, Great. Let me know how your testing goes. Should you hit any bugs or gaps, feel free to contact me privately to solve them. Then we can summarize on the list. Cheers, Paolo On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote: In MySQL (any SQL actually), string searches are computationally very expensive when compared with integers. If you’re dealing with a relatively small number of rows then the difference isn’t a problem. It’s like asking “show me all the rows where the number is between 1 and 1000” versus “show me all the rows that look like something between one and one thousand in words”. The second statement is the same question but requires a lot more processing because the database has to think about every number as a word. It might only take fractions of a second longer for one row, but multiply that by 300 million rows (which is what I’m trying to analyse) and the difference is huge. PostgreSQL handles this really well as it has support for IP’s as a column type which presumably converts to integers in the back end. I will try pmacct with ipv4 on my test collector with sql_num_hosts and if it works well then having the same for inet6_aton would be great. -Scott On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote: Hi Scott and Paolo, May I ask a question per your good experience? What is the performance difference between using string ipv4 and integer ipv4? And potential impact? I have a PC running sfacctd that collects 2 giga-switches. And it collects around 5M records into mysql per day. I am satisfied with the default string ipv4 for its convenience of being readable. And I would like to know the performance drawback. The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi guys, I just wanted to summarize that: It looks like a fine way to address IPs in a DB seems to be the pgsql way. Fast, and you can easily select data using subnets without relying on regexp. While in mysql you either obtain a int value difficult to handle, or a string very slow to search (choose between plague and cholera). For some reasons Scott is migrating from something that works to something that doesn't match the needs and causes problems. So, the DB engine choice was obviously not decided according to the needs. So the reasons were wrong. Rollback. Problem solved: there's no problem :) regards, Olivier Le 18 sept. 2014 à 12:04, itria30...@itri.org.tw a écrit : Hi Paolo and Scott, Your discussion pointed out an important issue - what is the proper design when it comes to high frequently IP-based queries? Inspired by this thread, my idea is to use postgresql and count on its cidr/inet function. (I am using mysql and store ipv4 as string(15)) This way the IP address is readable and processing IP address is also efficient. It seems worthy to try. I will try this approach later. Any comment is appreciated. -Original Message- From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf Of Paolo Lucente Sent: Thursday, September 18, 2014 8:44 AM To: Scott Pettit - Vorco Cc: pmacct-discussion@pmacct.net Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance Hi Scott, Great. Let me know how your testing goes. Should you hit any bugs or gaps, feel free to contact me privately to solve them. Then we can summarize on the list. Cheers, Paolo On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote: In MySQL (any SQL actually), string searches are computationally very expensive when compared with integers. If you’re dealing with a relatively small number of rows then the difference isn’t a problem. It’s like asking “show me all the rows where the number is between 1 and 1000” versus “show me all the rows that look like something between one and one thousand in words”. The second statement is the same question but requires a lot more processing because the database has to think about every number as a word. It might only take fractions of a second longer for one row, but multiply that by 300 million rows (which is what I’m trying to analyse) and the difference is huge. PostgreSQL handles this really well as it has support for IP’s as a column type which presumably converts to integers in the back end. I will try pmacct with ipv4 on my test collector with sql_num_hosts and if it works well then having the same for inet6_aton would be great. -Scott On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote: Hi Scott and Paolo, May I ask a question per your good experience? What is the performance difference between using string ipv4 and integer ipv4? And potential impact? I have a PC running sfacctd that collects 2 giga-switches. And it collects around 5M records into mysql per day. I am satisfied with the default string ipv4 for its convenience of being readable. And I would like to know the performance drawback. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi Scott, Great. Let me know how your testing goes. Should you hit any bugs or gaps, feel free to contact me privately to solve them. Then we can summarize on the list. Cheers, Paolo On Wed, Sep 17, 2014 at 04:35:05AM +, Scott Pettit - Vorco wrote: In MySQL (any SQL actually), string searches are computationally very expensive when compared with integers. If you’re dealing with a relatively small number of rows then the difference isn’t a problem. It’s like asking “show me all the rows where the number is between 1 and 1000” versus “show me all the rows that look like something between one and one thousand in words”. The second statement is the same question but requires a lot more processing because the database has to think about every number as a word. It might only take fractions of a second longer for one row, but multiply that by 300 million rows (which is what I’m trying to analyse) and the difference is huge. PostgreSQL handles this really well as it has support for IP’s as a column type which presumably converts to integers in the back end. I will try pmacct with ipv4 on my test collector with sql_num_hosts and if it works well then having the same for inet6_aton would be great. -Scott On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote: Hi Scott and Paolo, May I ask a question per your good experience? What is the performance difference between using string ipv4 and integer ipv4? And potential impact? I have a PC running sfacctd that collects 2 giga-switches. And it collects around 5M records into mysql per day. I am satisfied with the default string ipv4 for its convenience of being readable. And I would like to know the performance drawback. The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi Scott, You are looking for sql_num_hosts: true. This is tested to work with IPv4 addresses. I'm reasonably sure it won't work with IPv6: i seem to recall INET6_ATON was not available in 2011, the time of the original implementation of the feature. Should not be a biggie to introduce it. Cheers, Paolo On Mon, Sep 15, 2014 at 11:54:25PM +, Scott Pettit - Vorco wrote: Hi, I've been running pmacct for a while with PostgreSQL and it works well with lots of netflow data presumably due to inbuilt handling of CIDR. For various reasons we're migrating to a mySQL backend to integrate with our billing/customer portal environment better and mySQL does not appear to deal well with strings in complex reports. We generate around 10GB of data per hour so when I want to do something like select all the flows over a week and then spit out the top 50 AS sources, show the total data from each of the top AS sources, and then the top 50 destination IP's in our network that those AS's are sending data to, mySQL chokes for an hour trying to work that out whereas PostgreSQL takes a few minutes but doesn't struggle. I've played with indexes and got some improvement but nothing significant. I was looking into using ATON/NTOA functions in mySQL to store IP addresses as integers instead which I expect should dramatically increase performance especially when dealing with CIDR as I can query based on a range of integers. As far as I can see pmacct doesn't seem to have an option to tell it to do its INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such an option or a patch that exists? If not is there a reason why this would be a bad idea? -Scott The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi Scott and Paolo, May I ask a question per your good experience? What is the performance difference between using string ipv4 and integer ipv4? And potential impact? I have a PC running sfacctd that collects 2 giga-switches. And it collects around 5M records into mysql per day. I am satisfied with the default string ipv4 for its convenience of being readable. And I would like to know the performance drawback. Thank you. :) -Original Message- From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] On Behalf Of Paolo Lucente Sent: Wednesday, September 17, 2014 7:29 AM To: pmacct-discussion@pmacct.net Subject: Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance Hi Scott, You are looking for sql_num_hosts: true. This is tested to work with IPv4 addresses. I'm reasonably sure it won't work with IPv6: i seem to recall INET6_ATON was not available in 2011, the time of the original implementation of the feature. Should not be a biggie to introduce it. Cheers, Paolo On Mon, Sep 15, 2014 at 11:54:25PM +, Scott Pettit - Vorco wrote: Hi, I've been running pmacct for a while with PostgreSQL and it works well with lots of netflow data presumably due to inbuilt handling of CIDR. For various reasons we're migrating to a mySQL backend to integrate with our billing/customer portal environment better and mySQL does not appear to deal well with strings in complex reports. We generate around 10GB of data per hour so when I want to do something like select all the flows over a week and then spit out the top 50 AS sources, show the total data from each of the top AS sources, and then the top 50 destination IP's in our network that those AS's are sending data to, mySQL chokes for an hour trying to work that out whereas PostgreSQL takes a few minutes but doesn't struggle. I've played with indexes and got some improvement but nothing significant. I was looking into using ATON/NTOA functions in mySQL to store IP addresses as integers instead which I expect should dramatically increase performance especially when dealing with CIDR as I can query based on a range of integers. As far as I can see pmacct doesn't seem to have an option to tell it to do its INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such an option or a patch that exists? If not is there a reason why this would be a bad idea? -Scott The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] mySQL ATON/NTOA and pmacct performance
In MySQL (any SQL actually), string searches are computationally very expensive when compared with integers. If you’re dealing with a relatively small number of rows then the difference isn’t a problem. It’s like asking “show me all the rows where the number is between 1 and 1000” versus “show me all the rows that look like something between one and one thousand in words”. The second statement is the same question but requires a lot more processing because the database has to think about every number as a word. It might only take fractions of a second longer for one row, but multiply that by 300 million rows (which is what I’m trying to analyse) and the difference is huge. PostgreSQL handles this really well as it has support for IP’s as a column type which presumably converts to integers in the back end. I will try pmacct with ipv4 on my test collector with sql_num_hosts and if it works well then having the same for inet6_aton would be great. -Scott On 17/09/14 16:03, itria30...@itri.org.tw itria30...@itri.org.tw wrote: Hi Scott and Paolo, May I ask a question per your good experience? What is the performance difference between using string ipv4 and integer ipv4? And potential impact? I have a PC running sfacctd that collects 2 giga-switches. And it collects around 5M records into mysql per day. I am satisfied with the default string ipv4 for its convenience of being readable. And I would like to know the performance drawback. The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
[pmacct-discussion] mySQL ATON/NTOA and pmacct performance
Hi, I've been running pmacct for a while with PostgreSQL and it works well with lots of netflow data presumably due to inbuilt handling of CIDR. For various reasons we're migrating to a mySQL backend to integrate with our billing/customer portal environment better and mySQL does not appear to deal well with strings in complex reports. We generate around 10GB of data per hour so when I want to do something like select all the flows over a week and then spit out the top 50 AS sources, show the total data from each of the top AS sources, and then the top 50 destination IP's in our network that those AS's are sending data to, mySQL chokes for an hour trying to work that out whereas PostgreSQL takes a few minutes but doesn't struggle. I've played with indexes and got some improvement but nothing significant. I was looking into using ATON/NTOA functions in mySQL to store IP addresses as integers instead which I expect should dramatically increase performance especially when dealing with CIDR as I can query based on a range of integers. As far as I can see pmacct doesn't seem to have an option to tell it to do its INSERTs of ip_src and ip_dst with INET_ATON and INET6_ATON. Is there such an option or a patch that exists? If not is there a reason why this would be a bad idea? -Scott The content of this message and any attachments may be privileged, confidential or sensitive. Any unauthorised used is prohibited. Views expressed in this message are those of the individual sender, except where stated otherwise with appropriate authority. All pricing provided is valid at the time of writing only and due to factors such as the exchange rate, may change without notice. Sales are made subject to our Terms Conditions, available on our website or on request. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists