Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Paolo Lucente
Hi Daniel,

I see the 1 minute table contains duplicates - it would be
better to say that _everything_ extracted is repeated twice.
It could be handy to add tags to the aggregation method and
assign a different 'post_tag' to each plugin so to identify
who is generating them. 

I also wonder: how does the primary key of the 1 min table
look like? Is it any different from the 1 hour table? With
the sql_don_try_update turned on and the default indexing,
duplicates are not possible. 

Also at a closer look to the configuration you posted i see
no aggregate_filter are specified (see EXAMPLES): it means 
each plugin collects and tries to write to the same table
both inbound and outbound traffic. So either you can remove
one set of plugins or craft a proper aggregate_filter so
that each does only its bit of the job.

With regards to the missing tuples, from the few checks i've
done, it is always the case that something is in the 1 hour
table but can be missing in the 1 minute one. This can very
well be the result of a shared 'sql_preprocess: minb = 1000'
directive: a flow can accumulate more than 1000 bytes in 1
hour but not in 1 minute - and hence it's accounted in one
table and stripped off in the other.

Given the sql_preprocess you should never expect counters
to match for the same reason as above. To have a comparison
more apples to apples, you should consider removing it and
when confident everything is allright put it back again.

Finally, unrelated to the issue: please for the benefit of
public archives, don't send attachments to the list.

Cheers,
Paolo

 

On Fri, Feb 19, 2010 at 12:11:24PM +, Daniel Levy wrote:
 Hi Paolo,
 
 Here is a report with differences between the two tables.  There are a
 lot of differences between the download figures as well as some
 instances where IP addresses are only found in one table (see
 report.txt). I also have some raw data for the time periods between
 10:00 and 11:00 on 11/02/2010, which are being sent separately via
 yousendit.
 
 Regards
 
 -- 
 Daniel Levy
 
 Aptivate | http://www.aptivate.org/ | +44 (0)1223 760887
 The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
 
 Aptivate is a not-for-profit company registered in England and Wales
 with company number 04980791. 
 
 
 
 Paolo Lucente wrote:
  Hi Daniel,
 
  Getting through the data and compare traffic figures is,
  IHMO, the more practical approach - compared to trying to
  reproduce the issue in a controlled environment. Once you
  discover a descrepancy, it would be great to receive the
  contributing data of each report to see where the issue
  comes from.
 
  It's also true that version 0.9.1 is almost 5 years old; i
  would highly encourage to refresh it. I should be correct
  saying Ubuntu features also version 0.11.4 and 0.11.6 if
  you really don't like the idea of compiling 0.12 yourself
  (which would be my preferred approach).
 
  Let me know.
 
  Cheers,
  Paolo
 
 
 
  On Mon, Feb 15, 2010 at 10:41:21AM +, Daniel Levy wrote:

  Hi Paolo,
 
  Thanks for getting back to me. The version of pmacct being used is
  0.9.1-1ubuntu1. I'm not sure how the problem was discovered, but I have
  asked to person who found the problem to tell me and I will forward you
  the response. As for the reports, I'm not entirely sure what you need. I
  am considering going through the database data for each hour and
  comparing the total figures for uploaded and downloaded packets, per IP
  address between the two tables.  Would this give you the information
  you're looking for?
 
  -- 
  Daniel Levy
 
  Aptivate | http://www.aptivate.org/ | +44 (0)1223 760887
  The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
 
  Aptivate is a not-for-profit company registered in England and Wales
  with company number 04980791. 
 
 
 
  Paolo Lucente wrote:
  
  Hi Daniel,
 
  Unfortunately the configuration doesn't make evident where the
  issue can be. The 'sql_dont_try_update' very well protects against
  duplicate tuples - so i'm rather inclined to exclude that reason. 
 
  Which version are you using? How you did discover the issue - ie.
  did you upgrade recently from a previous version or is a fresh
  installation? Finally, is it possible to get - privately - two
  reports, one from each table, for the same time period? Say, one
  or better two hours? 
 
  Let me know.
 
  Cheers,
  Paolo
 
 
  On Fri, Feb 12, 2010 at 03:19:38PM +, Daniel Levy wrote:


  Hi,
 
  I'm using pmacct to store data in two tables, one containing data
  recorded on a per minute basis, the other containing data recorded on an
  hourly basis. When I get data for the first table over a period of three
  hours, the download traffic (calculated by adding up the bytes field for
  traffic where the ip_dst value is from a machine on the local network)
  for one IP address on the network is 3,719,772,656 bytes. The download
  traffic from the second table for the same IP address over a period 

Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Chris Wilson
Hi Paolo and Daniel,

(please allow me to jump in as I may be able to help here, despite 
currently being in country working on a project.)

On Fri, 19 Feb 2010, Paolo Lucente wrote:

 I also wonder: how does the primary key of the 1 min table look like? Is 
 it any different from the 1 hour table? With the sql_don_try_update 
 turned on and the default indexing, duplicates are not possible.

I deleted the primary key from that table because it should not be 
necessary (there should not be any duplicates if everything is configured 
correctly) and it makes inserts extremely slow (by a factor of 10-100) 
when the table gets large.

 Also at a closer look to the configuration you posted i see no 
 aggregate_filter are specified (see EXAMPLES): it means each plugin 
 collects and tries to write to the same table both inbound and outbound 
 traffic. So either you can remove one set of plugins or craft a proper 
 aggregate_filter so that each does only its bit of the job.

The inbound and outbound traffic are supposed to go into the same table, 
but you're right that the aggregate_filter appears to be missing and this 
is almost certainly the cause of the duplicate records in the short 
table. Daniel, could you please add something like this:

aggregate_filter[inbound1]: dst net 10.0.156.0/24
aggregate_filter[outbound1]: src net 10.0.156.0/24
aggregate_filter[inbound2]: dst net 10.0.156.0/24
aggregate_filter[outbound2]: src net 10.0.156.0/24

However, I'm surprised that this doesn't also happen in the long table?

 With regards to the missing tuples, from the few checks i've done, it is 
 always the case that something is in the 1 hour table but can be missing 
 in the 1 minute one. This can very well be the result of a shared 
 'sql_preprocess: minb = 1000' directive: a flow can accumulate more than 
 1000 bytes in 1 hour but not in 1 minute - and hence it's accounted in 
 one table and stripped off in the other.

Yes, I would expect the long table totals to be slightly more than the 
short table ones for this reason. However, the problem that we're seeing 
is the opposite: the totals calculated from the long table are less than 
those from the short table, even though the long table includes flows that 
the short table doesn't.

And, while this might be accounted for by the duplicate flows in the short 
table, the same should apply to the long table, so I think it should have 
balanced out.

 Given the sql_preprocess you should never expect counters to match for 
 the same reason as above. To have a comparison more apples to apples, 
 you should consider removing it and when confident everything is 
 allright put it back again.

Unfortunately we cannot do this in the production environment, as the 
number of rows of tiny flows (which are effectively noise) completely 
dwarfs the real data, overloads our firewall's CPU and disk space, and 
makes querying so slow that the data is useless. This is where a test lab 
environment would be useful.

Thanks for your help with this :)

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Karl O. Pinc
On 02/19/2010 07:42:08 AM, Chris Wilson wrote:
 Hi Paolo and Daniel,

 I deleted the primary key from that table because it should not be 
 necessary (there should not be any duplicates if everything is
 configured 
 correctly) and it makes inserts extremely slow (by a factor of 
 10-100)
 
 when the table gets large.

FWIW, the automatic sequential key generation speed is unrelated
to table size when using postgresql.

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Chris Wilson
Hi Karl,

On Fri, 19 Feb 2010, Karl O. Pinc wrote:
 On 02/19/2010 07:42:08 AM, Chris Wilson wrote:
 
  I deleted the primary key from that table because it should not be 
  necessary (there should not be any duplicates if everything is 
  configured correctly) and it makes inserts extremely slow (by a factor 
  of 10-100) when the table gets large.
 
 FWIW, the automatic sequential key generation speed is unrelated
 to table size when using postgresql.

There is no sequence to generate as far as I know. The problem is the size 
of the index file, and the fact that it has to be rewritten for every 
insert (or block of inserts) that makes insertion get slower as database 
size increases.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Daniel Levy
Thanks, Paolo, for your help. Sorry about the attachments. I have
updated the pmacctd.conf file and removed a couple of the plugins  to
prevent data replication. 

Regards

-- 
Daniel Levy

Aptivate | http://www.aptivate.org/ | +44 (0)1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791. 



Chris Wilson wrote:
 Sorry, I realised just after I hit Send (as usual):

 On Fri, 19 Feb 2010, Chris Wilson wrote:

   
 I also wonder: how does the primary key of the 1 min table look like? 
 Is it any different from the 1 hour table? With the sql_don_try_update 
 turned on and the default indexing, duplicates are not possible.
   
 I deleted the primary key from that table because it should not be 
 necessary (there should not be any duplicates if everything is 
 configured correctly) and it makes inserts extremely slow (by a factor 
 of 10-100) when the table gets large.

 
 Also at a closer look to the configuration you posted i see no 
 aggregate_filter are specified (see EXAMPLES): it means each plugin 
 collects and tries to write to the same table both inbound and outbound 
 traffic. So either you can remove one set of plugins or craft a proper 
 aggregate_filter so that each does only its bit of the job.
   
 However, I'm surprised that this doesn't also happen in the long table?
 

 I know why: the primary key still exists on the long table, rejecting the 
 duplicate entries. So this almost certainly accounts for the problem.

 Cheers, Chris.
   

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-19 Thread Karl O. Pinc
On 02/19/2010 10:24:57 AM, Chris Wilson wrote:
 Hi Karl,
 
 On Fri, 19 Feb 2010, Karl O. Pinc wrote:

  
  FWIW, the automatic sequential key generation speed is unrelated
  to table size when using postgresql.
 
 There is no sequence to generate as far as I know. The problem is the
 size 
 of the index file, and the fact that it has to be rewritten for every 
 insert (or block of inserts) that makes insertion get slower as
 database 
 size increases.

Ah, thanks.  Looks like there's no getting around that if you're going
to check for duplicates

What backend db are you using?  It does not seem like such slowdown
is warranted, especially on an index of the primary key which
can be stored and ordered along with the tuples themselves.
Or maybe that's the problem, ordering the tuples sucks up ram
and you'd be better off if the index was stored separately from
the tuples?  Thinking out loud here

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-16 Thread Paolo Lucente
Hi Daniel,

Getting through the data and compare traffic figures is,
IHMO, the more practical approach - compared to trying to
reproduce the issue in a controlled environment. Once you
discover a descrepancy, it would be great to receive the
contributing data of each report to see where the issue
comes from.

It's also true that version 0.9.1 is almost 5 years old; i
would highly encourage to refresh it. I should be correct
saying Ubuntu features also version 0.11.4 and 0.11.6 if
you really don't like the idea of compiling 0.12 yourself
(which would be my preferred approach).

Let me know.

Cheers,
Paolo



On Mon, Feb 15, 2010 at 10:41:21AM +, Daniel Levy wrote:
 Hi Paolo,
 
 Thanks for getting back to me. The version of pmacct being used is
 0.9.1-1ubuntu1. I'm not sure how the problem was discovered, but I have
 asked to person who found the problem to tell me and I will forward you
 the response. As for the reports, I'm not entirely sure what you need. I
 am considering going through the database data for each hour and
 comparing the total figures for uploaded and downloaded packets, per IP
 address between the two tables.  Would this give you the information
 you're looking for?
 
 -- 
 Daniel Levy
 
 Aptivate | http://www.aptivate.org/ | +44 (0)1223 760887
 The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES
 
 Aptivate is a not-for-profit company registered in England and Wales
 with company number 04980791. 
 
 
 
 Paolo Lucente wrote:
  Hi Daniel,
 
  Unfortunately the configuration doesn't make evident where the
  issue can be. The 'sql_dont_try_update' very well protects against
  duplicate tuples - so i'm rather inclined to exclude that reason. 
 
  Which version are you using? How you did discover the issue - ie.
  did you upgrade recently from a previous version or is a fresh
  installation? Finally, is it possible to get - privately - two
  reports, one from each table, for the same time period? Say, one
  or better two hours? 
 
  Let me know.
 
  Cheers,
  Paolo
 
 
  On Fri, Feb 12, 2010 at 03:19:38PM +, Daniel Levy wrote:

  Hi,
 
  I'm using pmacct to store data in two tables, one containing data
  recorded on a per minute basis, the other containing data recorded on an
  hourly basis. When I get data for the first table over a period of three
  hours, the download traffic (calculated by adding up the bytes field for
  traffic where the ip_dst value is from a machine on the local network)
  for one IP address on the network is 3,719,772,656 bytes. The download
  traffic from the second table for the same IP address over a period of
  one week, including the three hour period mentioned above, is
  significantly smaller (2,114,286,512 bytes) where I would expect it to
  be much larger and I can't figure out why. A slightly modified version
  of the contents of my pmacctd.conf file is given below. Can anyone help?
 
  daemonize: true
  pidfile: /var/run/pmacctd.pid
  syslog: daemon
 
  plugins: mysql[inbound1], mysql[outbound1], mysql[inbound2],
  mysql[outbound2]
 
  aggregate[inbound1]: src_host, src_port, dst_host, dst_port, proto
  aggregate[outbound1]: src_host, src_port, dst_host, dst_port, proto
  aggregate[inbound2]: src_host, src_port, dst_host, dst_port, proto
  aggregate[outbound2]: src_host, src_port, dst_host, dst_port, proto
 
  pcap_filter: not (src and dst net 0.0.0.0/24)
 
 
  sql_db: pmacct
  sql_table[inbound1]: short_data_table
  sql_table[outbound1]: short_data_table
 
  sql_table[inbound2]: long_data_table
  sql_table[outbound2]: long_data_table
 
  sql_history[inbound1]: 1m
  sql_history[outbound1]: 1m
  sql_history[inbound2]: 1h
  sql_history[outbound2]: 1h
 
  sql_history_roundoff[inbound1]: m
  sql_history_roundoff[outbound1]: m
  sql_history_roundoff[inbound2]: h
  sql_history_roundoff[outbound2]: h
  sql_table_version: 6
  sql_host: localhost
  sql_user: auser
  sql_passwd: apass
 
  sql_refresh_time[inbound1]: 60
  sql_refresh_time[outbound1]: 60
  sql_refresh_time[inbound2]: 3600
  sql_refresh_time[outbound2]: 3600
  sql_dont_try_update: true
  sql_optimize_clauses: true
 
  sql_preprocess: minb = 1000
 
  Regards
 
  -- 
  Daniel Levy
 
 
  ___
  pmacct-discussion mailing list
  http://www.pmacct.net/#mailinglists
  
 
  ___
  pmacct-discussion mailing list
  http://www.pmacct.net/#mailinglists


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-15 Thread Daniel Levy
Hi Paolo,

Thanks for getting back to me. The version of pmacct being used is
0.9.1-1ubuntu1. I'm not sure how the problem was discovered, but I have
asked to person who found the problem to tell me and I will forward you
the response. As for the reports, I'm not entirely sure what you need. I
am considering going through the database data for each hour and
comparing the total figures for uploaded and downloaded packets, per IP
address between the two tables.  Would this give you the information
you're looking for?

-- 
Daniel Levy

Aptivate | http://www.aptivate.org/ | +44 (0)1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791. 



Paolo Lucente wrote:
 Hi Daniel,

 Unfortunately the configuration doesn't make evident where the
 issue can be. The 'sql_dont_try_update' very well protects against
 duplicate tuples - so i'm rather inclined to exclude that reason. 

 Which version are you using? How you did discover the issue - ie.
 did you upgrade recently from a previous version or is a fresh
 installation? Finally, is it possible to get - privately - two
 reports, one from each table, for the same time period? Say, one
 or better two hours? 

 Let me know.

 Cheers,
 Paolo


 On Fri, Feb 12, 2010 at 03:19:38PM +, Daniel Levy wrote:
   
 Hi,

 I'm using pmacct to store data in two tables, one containing data
 recorded on a per minute basis, the other containing data recorded on an
 hourly basis. When I get data for the first table over a period of three
 hours, the download traffic (calculated by adding up the bytes field for
 traffic where the ip_dst value is from a machine on the local network)
 for one IP address on the network is 3,719,772,656 bytes. The download
 traffic from the second table for the same IP address over a period of
 one week, including the three hour period mentioned above, is
 significantly smaller (2,114,286,512 bytes) where I would expect it to
 be much larger and I can't figure out why. A slightly modified version
 of the contents of my pmacctd.conf file is given below. Can anyone help?

 daemonize: true
 pidfile: /var/run/pmacctd.pid
 syslog: daemon

 plugins: mysql[inbound1], mysql[outbound1], mysql[inbound2],
 mysql[outbound2]

 aggregate[inbound1]: src_host, src_port, dst_host, dst_port, proto
 aggregate[outbound1]: src_host, src_port, dst_host, dst_port, proto
 aggregate[inbound2]: src_host, src_port, dst_host, dst_port, proto
 aggregate[outbound2]: src_host, src_port, dst_host, dst_port, proto

 pcap_filter: not (src and dst net 0.0.0.0/24)


 sql_db: pmacct
 sql_table[inbound1]: short_data_table
 sql_table[outbound1]: short_data_table

 sql_table[inbound2]: long_data_table
 sql_table[outbound2]: long_data_table

 sql_history[inbound1]: 1m
 sql_history[outbound1]: 1m
 sql_history[inbound2]: 1h
 sql_history[outbound2]: 1h

 sql_history_roundoff[inbound1]: m
 sql_history_roundoff[outbound1]: m
 sql_history_roundoff[inbound2]: h
 sql_history_roundoff[outbound2]: h
 sql_table_version: 6
 sql_host: localhost
 sql_user: auser
 sql_passwd: apass

 sql_refresh_time[inbound1]: 60
 sql_refresh_time[outbound1]: 60
 sql_refresh_time[inbound2]: 3600
 sql_refresh_time[outbound2]: 3600
 sql_dont_try_update: true
 sql_optimize_clauses: true

 sql_preprocess: minb = 1000

 Regards

 -- 
 Daniel Levy


 ___
 pmacct-discussion mailing list
 http://www.pmacct.net/#mailinglists
 

 ___
 pmacct-discussion mailing list
 http://www.pmacct.net/#mailinglists
   

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Pmacct data inconsistencies between tables.

2010-02-12 Thread Paolo Lucente
Hi Daniel,

Unfortunately the configuration doesn't make evident where the
issue can be. The 'sql_dont_try_update' very well protects against
duplicate tuples - so i'm rather inclined to exclude that reason. 

Which version are you using? How you did discover the issue - ie.
did you upgrade recently from a previous version or is a fresh
installation? Finally, is it possible to get - privately - two
reports, one from each table, for the same time period? Say, one
or better two hours? 

Let me know.

Cheers,
Paolo


On Fri, Feb 12, 2010 at 03:19:38PM +, Daniel Levy wrote:
 Hi,
 
 I'm using pmacct to store data in two tables, one containing data
 recorded on a per minute basis, the other containing data recorded on an
 hourly basis. When I get data for the first table over a period of three
 hours, the download traffic (calculated by adding up the bytes field for
 traffic where the ip_dst value is from a machine on the local network)
 for one IP address on the network is 3,719,772,656 bytes. The download
 traffic from the second table for the same IP address over a period of
 one week, including the three hour period mentioned above, is
 significantly smaller (2,114,286,512 bytes) where I would expect it to
 be much larger and I can't figure out why. A slightly modified version
 of the contents of my pmacctd.conf file is given below. Can anyone help?
 
 daemonize: true
 pidfile: /var/run/pmacctd.pid
 syslog: daemon
 
 plugins: mysql[inbound1], mysql[outbound1], mysql[inbound2],
 mysql[outbound2]
 
 aggregate[inbound1]: src_host, src_port, dst_host, dst_port, proto
 aggregate[outbound1]: src_host, src_port, dst_host, dst_port, proto
 aggregate[inbound2]: src_host, src_port, dst_host, dst_port, proto
 aggregate[outbound2]: src_host, src_port, dst_host, dst_port, proto
 
 pcap_filter: not (src and dst net 0.0.0.0/24)
 
 
 sql_db: pmacct
 sql_table[inbound1]: short_data_table
 sql_table[outbound1]: short_data_table
 
 sql_table[inbound2]: long_data_table
 sql_table[outbound2]: long_data_table
 
 sql_history[inbound1]: 1m
 sql_history[outbound1]: 1m
 sql_history[inbound2]: 1h
 sql_history[outbound2]: 1h
 
 sql_history_roundoff[inbound1]: m
 sql_history_roundoff[outbound1]: m
 sql_history_roundoff[inbound2]: h
 sql_history_roundoff[outbound2]: h
 sql_table_version: 6
 sql_host: localhost
 sql_user: auser
 sql_passwd: apass
 
 sql_refresh_time[inbound1]: 60
 sql_refresh_time[outbound1]: 60
 sql_refresh_time[inbound2]: 3600
 sql_refresh_time[outbound2]: 3600
 sql_dont_try_update: true
 sql_optimize_clauses: true
 
 sql_preprocess: minb = 1000
 
 Regards
 
 -- 
 Daniel Levy
 
 
 ___
 pmacct-discussion mailing list
 http://www.pmacct.net/#mailinglists

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists