Re: [collectd] curl_xml plugin
On Mon, Jan 11, 2010 at 9:26 PM, Florian Forster o...@verplant.org wrote: Hi Amit, thank you very much for your patch :) Thanks for applying the patch :) I did the following changes: * Re-order all of the functions to get rid of the forward declarations. * Replace the string and boolean handling functions for config options with functions declared in src/configfile.h. This way the functionality is not duplicated. * Put the type_instance setting and value parsing into a separate functions. This makes “cx_submit_xpath_values” considerably bit shorter. Please let me know if my changes broke anything – unfortunately I didn't have the possibility to test them :/ I did a sanity testing and the plugin seems to be working fine. #define CX_KEY_MAGIC 0x43484b59UL /* CHKY */ #define CX_IS_KEY(key) (key)-magic == CX_KEY_MAGIC What do we need this hack for? I've found only one place where “c_avl_insert” is called, so the elements in the tree should all be of the same type, right? I'd be great if you could remove or document this. There is no need of this magic key. It just happens to be there since I borrowed the code from curl_json :). I will clean this up. Maybe the entire AVL tree should be removed: As far as I see it is only used to iterate over the values (rather than searching for a specific key), so a linked list is probably more appropriate. Yeah I agree. I have replaced avl tree with a linked list. Both, “cx_check_type” and “cx_submit_xpath_values” call “plugin_get_ds”. I think one call could be optimized away. This is also done. Do find the updated patch attached. Regards Amit Best regards, —octo [0] http://git.verplant.org/?p=collectd.git;a=blob_plain;f=src/curl_xml.c;hb=refs/heads/ag/curl_xml -- Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFLS0olHdggu3Q05IYRAuMXAJ440LTtkMUZaDb7MUq4EdnXw8SXTgCfS3gS Pat48nthKsGvE3YMI6rIOh0= =VgpJ -END PGP SIGNATURE- --- collectd-latest/src/curl_xml.c Wed Jan 13 16:23:42 2010 +++ collectd-latest-mine/src/curl_xml.c Wed Jan 13 15:39:22 2010 @@ -23,7 +23,7 @@ #include common.h #include plugin.h #include configfile.h -#include utils_avltree.h +#include utils_llist.h #include libxml/parser.h #include libxml/tree.h @@ -32,8 +32,6 @@ #include curl/curl.h #define CX_DEFAULT_HOST localhost -#define CX_KEY_MAGIC 0x43484b59UL /* CHKY */ -#define CX_IS_KEY(key) (key)-magic == CX_KEY_MAGIC /* * Private data structures @@ -79,7 +77,7 @@ size_t buffer_size; size_t buffer_fill; - c_avl_tree_t *tree; /* tree of xpath blocks */ + llist_t *list; /* list of xpath blocks */ }; typedef struct cx_s cx_t; /* }}} */ @@ -138,25 +136,26 @@ sfree (xpath); } /* }}} void cx_xpath_free */ -static void cx_tree_free (c_avl_tree_t *tree) /* {{{ */ +static void cx_list_free (llist_t *list) /* {{{ */ { - char *name; - void *value; + llentry_t *le; - while (c_avl_pick (tree, (void *) name, (void *) value) == 0) + le = llist_head (list); + while (le != NULL) { -cx_xpath_t *key = (cx_xpath_t *)value; +llentry_t *le_next; -if (CX_IS_KEY(key)) - cx_xpath_free (key); -else - cx_tree_free ((c_avl_tree_t *)value); +le_next = le-next; -sfree (name); +sfree (le-key); +cx_xpath_free (le-value); + +le = le_next; } - c_avl_destroy (tree); -} /* }}} void cx_tree_free */ + llist_destroy (list); + list = NULL; +} /* }}} void cx_list_free */ static void cx_free (void *arg) /* {{{ */ { @@ -173,9 +172,8 @@ curl_easy_cleanup (db-curl); db-curl = NULL; - if (db-tree != NULL) -cx_tree_free (db-tree); - db-tree = NULL; + if (db-list != NULL) +cx_list_free (db-list); sfree (db-buffer); sfree (db-instance); @@ -190,11 +188,8 @@ sfree (db); } /* }}} void cx_free */ -static int cx_check_type (cx_xpath_t *xpath) /* {{{ */ +static int cx_check_type (const data_set_t *ds, cx_xpath_t *xpath) /* {{{ */ { - const data_set_t *ds; - - ds = plugin_get_ds (xpath-type); if (!ds) { WARNING (curl_xml plugin: DataSet `%s' not defined., xpath-type); @@ -373,7 +368,7 @@ if ( (tmp_size == 0) (is_table) ) { WARNING (curl_xml plugin: - relative xpath expression for 'Instance' \%s\ doesn't match + relative xpath expression for 'InstanceFrom' \%s\ doesn't match any of the nodes. Skipping the node., xpath-instance); xmlXPathFreeObject (instance_node_obj); return (-1); @@ -382,7 +377,7 @@ if (tmp_size 1) { WARNING (curl_xml plugin: - relative xpath expression for 'Instance' \%s\ is expected + relative xpath expression for 'InstanceFrom' \%s\ is expected to return only one text node. Skipping the node., xpath-instance); xmlXPathFreeObject (instance_node_obj); return (-1); @@ -425,7
Re: [collectd] Strange SNMP collection glitches
On Wed, Jan 13, 2010 at 11:57:05AM +0100, Mirko Buffoni wrote: If that is the case, how could I solve this behavior which is going to cause the graphs to be unusable due to the oversized scale factor? One way is to replace the COUNTER data source with a DERIVE data source and set the minimum value to zero. DERIVE data sources are documented at [0]. Then, when the counter is reset to zero, the rate conversion ((new value - old value) / interval) will result in a negative value (because new value is zero). This negative value is then ignored due to the minimum value being zero. The downside is that this will happen also when the counter overflows, so you will occasionally lose a legitimate value. You can change the DS type of existing RRD files using rrdtool tune --data-source-type … (see rrdtune(1)) Another possibility is to set a correct maximum value. For example, if you have a 10 Mbps line you could set maximum to 125 (Byte/s). You can change the maximum value of a DS using rrdtool tune --maximum … (see rrdtune(1)) In the meantime, I'll try to increase traffic to speed up the counter to reach the overflow point. I don't think that will cause the problem. COUNTER data sources handle *overflows* correctly, only counter *resets* are a problem. Resets are often caused when an interface is taken down and up again. Especially WAN interfaces are prone to this. Regards, —octo P.S.: I've CC'd the mailing list because your first question is probably interesting for other people, too. [0] http://collectd.org/wiki/index.php/Data_source -- Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ signature.asc Description: Digital signature ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] Strange SNMP collection glitches
At 12.07 13/01/2010 +0100, you wrote: On Wed, Jan 13, 2010 at 11:57:05AM +0100, Mirko Buffoni wrote: If that is the case, how could I solve this behavior which is going to cause the graphs to be unusable due to the oversized scale factor? One way is to replace the COUNTER data source with a DERIVE data source and set the minimum value to zero. DERIVE data sources are documented at [0]. Then, when the counter is reset to zero, the rate conversion ((new value - old value) / interval) will result in a negative value (because new value is zero). This negative value is then ignored due to the minimum value being zero. The downside is that this will happen also when the counter overflows, so you will occasionally lose a legitimate value. You can change the DS type of existing RRD files using rrdtool tune --data-source-type ⦠(see rrdtune(1)) Another possibility is to set a correct maximum value. For example, if you have a 10 Mbps line you could set maximum to 125 (Byte/s). You can change the maximum value of a DS using rrdtool tune --maximum ⦠(see rrdtune(1)) I set the correct maximum value to the RRD archives for both datasource tx and rx. rrdtool tune archive.rrd --maximum rx:240 rrdtool tune archive.rrd --maximum tx:240 with a dump I see that the change has been done correctly. However the past daily/weekly/monthly graphs are unchanged and retain those autoscaled values. I wouldn't want to change collection.cgi, but I'd like to readjust past rrd values to the new constraints. Is this possible? Mirko ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] Strange SNMP collection glitches
Hi Mirko, On Wed, Jan 13, 2010 at 12:36:20PM +0100, Mirko Buffoni wrote: with a dump I see that the change has been done correctly. However the past daily/weekly/monthly graphs are unchanged and retain those autoscaled values. I wouldn't want to change collection.cgi, but I'd like to readjust past rrd values to the new constraints. Is this possible? yes, this is done by dumping the RRD file to its XML representation (rrddump(1)) and then restoring the binary file with the --range-check option (rrdrestore(1)). Regards, —octo -- Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ signature.asc Description: Digital signature ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
[collectd] rrdcached plugin question
I just started messing with rrdcached+collectd, so it's not impossible that I've obtusely missed something in the docs. * Is it possible to set RRARows and RRATimespan in the rrdcached plugin like you can for rrdtool? If I try using them, collectd complains that: [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRARows'. [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRATimespan'. [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRATimespan'. [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRATimespan'. [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRATimespan'. [2010-01-13 18:06:36] Plugin `rrdcached' did not register for value `RRATimespan'. And any rrd files created have the usual collectd defaults, instead of what I put. And they only seem to be registered in src/rrdtool.c. I also tried turning on the rrdtool plugin but without a DataDir in the hopes it might have some effect. Is there a trick here that I'm missing? We've got non-technical folks running reports from our rrd data, and they balk at the built-in intervals, so I've got the following because it falls on nice, grok-able datapoints, spaced by 5 mins, 1 day, etc. Here's my rrdcached (pointers most welcome): /usr/bin/rrdcached -l unix:/var/run/rrdcached/rrdcached.sock -b /var/lib/collectd/rrd2 -j /var/run/rrdcached -w 3600 -z 3600 -f 7200 -t 10 -p /var/run/rrdcached/rrdcached.pid I've been working with a super stripped-down collectd.conf, to try to rule out other things. Here it is: # Hostname localhost FQDNLookup true BaseDir /var/lib/collectd PIDFile /var/run/collectd/collectd.pid PluginDir /usr/lib/collectd TypesDB /usr/share/collectd/types.db Interval30 ReadThreads 20 LoadPlugin logfile Plugin logfile LogLevel error File STDOUT Timestamp true /Plugin LoadPlugin network # LoadPlugin rrdtool LoadPlugin rrdcached Plugin network Listen 10.20.2.2 25826 TimeToLive 128 /Plugin # Plugin rrdtool # DataDir /var/lib/collectd/rrd # CacheTimeout300 # CacheFlush 600 # RandomTimeout 60 # # # StepSize 30 # # HeartBeat 120 # WritesPerSecond 500 # RRARows 2400 # RRATimespan 4800 # RRATimespan 144000 # RRATimespan 72 # RRATimespan 432 # RRATimespan 4320 # /Plugin Plugin rrdcached DataDir /var/lib/collectd/rrd2 CreateFiles true DaemonAddress unix:/var/run/rrdcached/rrdcached.sock # StepSize 30 # HeartBeat 120 WritesPerSecond 500 RRARows 2400 RRATimespan 4800 RRATimespan 144000 RRATimespan 72 RRATimespan 432 RRATimespan 4320 /Plugin ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
[collectd] Processes plugin
Since the 4.9.0 upgrade, I see this popping up on all of my boxes: Jan 13 20:35:39 server collectd[8501]: rrdtool plugin: rrd_update_r (/var/lib/collectd/rrd/server/processes-httpd/ps_disk_octets.rrd) failed: not a simple integer: '-1719325917' It's not happening every collectd interval but it looks like once every 3-7 intervals, and always seems to be the ps_disk_octets metric. Here's a grab of non-nan for processes-httpd/ps_disk_octets.rrd MAX: 1263426420: 8.9397714667e+06 8.8867349333e+06 1263426480: 8.9397714667e+06 8.8867349333e+06 1263426540: 2.8722361756e+06 2.8611358122e+06 1263426600: 2.8722361756e+06 2.8611358122e+06 1263429840: 6.9840988933e+06 6.9629671933e+06 1263429900: 6.9840988933e+06 6.9629671933e+06 1263429960: 4.1813118933e+06 4.1651611100e+06 1263430020: 2.4956140633e+06 2.4853952000e+06 1263432180: 1.5469198067e+07 1.5460964333e+07 1263432240: 3.5285845300e+06 3.5101065467e+06 There are big holes there and the 'nan' rows are about 60% of the file. The biggest recorded value is 28858203.767. The lowest number reported in the error message is -2147483522 (ranges all the way up to -5). Presumably something's overflowing :) Other background: These are all Debian Etch, running collectd 4.9.0. They're all 32-bit boxes, all running fairly new linux kernels, all with CONFIG_TASK_IO_ACCOUNTING=y. The example above is from a box running 2.6.32.3, but I see this happening on other boxes regardless of the kernel (even down to 2.6.29.x and beyond). The above example is a pretty heavily loaded web server. Though it's serving *only* read-only web traffic, it does write a good deal of logs out, so it's not impossible for it to have very high IO numbers. This isn't a big deal, just a minor annoyance, but I figured I'd mention it. ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd