Re: [Ganglia-general] Extending the format of gmetad.conf
Jesse Becker wrote: On Wed, Jan 6, 2010 at 12:01, Ofer Inbar c...@a.org wrote: Daniel Pocock dan...@pocock.com.au wrote: and 3.2 can possibly go to a full XML format gmetad.conf with more advanced templates, etc. Please tell me that's not being considered? XML is a horrible lousy format for these kinds of config files. I concur. I *STRONGLY* object to using XML for configuration files. It's not so bad for data, but config files. I don't think it's a question of XML is good or bad, and I never intended to exclude other possibilities. I think we need to consider various factors: - some configuration settings (e.g. setuid, debug level) probably requires little more than the current file format and these settings are probably not changed often, and even then, only an advanced user would change them - other configuration (e.g. the list of clusters, the RRAs or the proposed templates) may change more often and may need to be changed by some kind of tool Data in the latter category may be appropriate for XML. It may also be appropriate to have it in an RDBMS. Maybe even a DSO approach is needed, where people can code their own configuration module to read a custom source. Before making such decisions, we would look at: - what kind of tools would be useful for working with the data: a web interface? Generic XML editor? Custom tool written in Java? Or would users like to transform data from another source? - is it important for users to maintain the files manually, or will the focus shift to tools, web interface or config files generated from some other enterprise data source? -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
Daniel Pocock wrote: - is it important for users to maintain the files manually, or will the focus shift to tools, web interface or config files generated from some other enterprise data source? I've been content with the existing file format for the 7 or so years I've been running ganglia. At this point if changes were going to be made, I think making it consistent with gmond's configuration format would be a noble effort. Not a big fan of any kind of GUI interface to maintain a text file, many other software packages have made this mistake.. the problems it creates is a config file that reads like line noise, or hidden options in the config that never get GUI elements to control them. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
Carlo Marcelo Arenas Belon wrote: On Mon, Jan 04, 2010 at 06:55:36AM -0500, Jesse Becker wrote: On Mon, Jan 4, 2010 at 03:46, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. why do you want to have this? what is the use case for having different metric storage frequencies per cluster and why can't be done by having instead independent gmetad? I can think of reasons why you'd want different frequencies for the same metric, mostly having to do with required data retention policies and lack of resources (disk space). It could be done with different gmetad processes, but that gets complicated for a simple cluster (multiple gmetad polling the same gmond, the same data is displayed in two different locations). Of course I can think of reasons why that might be something you would want to have, and that is why I said that might be needed in the long run if those reasons are genuine, but I will be surprised if there is a reason good enough to do that from the very beginning when using multiple gmetad would solve it for now IMHO. Multiple gmetad is technically a viable solution, but slightly harder to support - I'd rather look at a box and say `is gmetad running' rather than `are the right number of gmetads running' The point is that the syntactical sugar to make that work would be far more complicated and difficult to do in a 3.1 compatible way than just adding templates and therefore I would tend to believe that it would make more sense as a 3.2 feature, while having different RRAs independently of which datasource has been in the wishlist since even before 3.1.0 was released and would be something you would instead want backported ASAP (probably even to 3.0 if there is demand) I believe it can be done in a way that is 3.1 compatible, and 3.2 can possibly go to a full XML format gmetad.conf with more advanced templates, etc. if you are talking about different metric storage frequencies per metric as it seems to be implied later (and which is a feature long in the wishlist) then wouldn't be safe to assume you want that storage for that metric regardless of source?, if that is the case it will simplify the implementation and will only require something like RRAs_template as shown in d and not need a, b, or c at all (or at least not as part of the first implementation). currently in data_source the polling interval is optional and so the same could be done with the template to apply in the long run, but complicating the configuration parser, for IMHO no really good reason. using a script is definitely interesting because of the flexibility it allows for, but as mentioned before a problem because of the additional forking required and also problematic because it will keep part of the logic outside gmetad. Perhaps I'm misunderstanding how using a separate script would work, but there would only be a fork storm during initial RRD creation, correct? it depends on what the script does, but that is correct in the case that the script is only returning the RRAs back to gmetad as you suggested. The script would only be called when a new RRD is needed, so it is not so bad. A module-based solution could be implemented too. Another solution for those who want the logic outside of gmetad: maybe we need a gmetad.conf option that tells it NOT to create any RRD that doesn't exist. It would then log an error the first time it sees a metric for that RRD. The user could then run scripts outside of gmetad's control to do things like parsing the log and creating the RRD files they want. Also, there is always an overhead in RRD creation - one possible algorithm is to create a whole bunch of blank RRD files before starting gmetad and then rename them as they are required. still the disadvantage (as mentioned above) of not being able to know from just reading the gmetad.conf which RRA apply on each case still applies and would probably imply that the best way to do this will be to make gmetad modular (just like gmond) and then allow it to write its own configuration or use one by default that could be used as a starting point just like `gmond -t` allows for. I'm not sure that having something people can customize is a disadvantage I had assumed that the current behavior of keep existing RRD file would remain. Thus, the only time we would really have to worry about forking off hundreds/thousands of processes would be when a new cluster is created, or when the RRD files are all removed for some reason. Under normal operating circumstances, the RRD files already exist, so there's no need to run the creation script. or when gmetad is restarted and have to again figure out which RRA apply on each case for the updates and unless gmetad.conf has all
Re: [Ganglia-general] Extending the format of gmetad.conf
On Wed, Jan 6, 2010 at 12:01, Ofer Inbar c...@a.org wrote: Daniel Pocock dan...@pocock.com.au wrote: and 3.2 can possibly go to a full XML format gmetad.conf with more advanced templates, etc. Please tell me that's not being considered? XML is a horrible lousy format for these kinds of config files. I concur. I *STRONGLY* object to using XML for configuration files. It's not so bad for data, but config files. -- Jesse Becker Every cloud has a silver lining, except for the mushroom-shaped ones, which come lined with strontium-90. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
On Mon, Dec 28, 2009 at 10:16:28PM +, Daniel Pocock wrote: I'm looking at extending the gmetad.conf format, while still making sure that it can read the existing config files. adding a new configuration option would be the easiest way to prevent any backward incompatible change which will then force this feature to be 3.2+ only. My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. why do you want to have this? what is the use case for having different metric storage frequencies per cluster and why can't be done by having instead independent gmetad? if you are talking about different metric storage frequencies per metric as it seems to be implied later (and which is a feature long in the wishlist) then wouldn't be safe to assume you want that storage for that metric regardless of source?, if that is the case it will simplify the implementation and will only require something like RRAs_template as shown in d and not need a, b, or c at all (or at least not as part of the first implementation). currently in data_source the polling interval is optional and so the same could be done with the template to apply in the long run, but complicating the configuration parser, for IMHO no really good reason. using a script is definitely interesting because of the flexibility it allows for, but as mentioned before a problem because of the additional forking required and also problematic because it will keep part of the logic outside gmetad. Carlo -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
On Mon, Jan 4, 2010 at 03:46, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. why do you want to have this? what is the use case for having different metric storage frequencies per cluster and why can't be done by having instead independent gmetad? I can think of reasons why you'd want different frequencies for the same metric, mostly having to do with required data retention policies and lack of resources (disk space). It could be done with different gmetad processes, but that gets complicated for a simple cluster (multiple gmetad polling the same gmond, the same data is displayed in two different locations). if you are talking about different metric storage frequencies per metric as it seems to be implied later (and which is a feature long in the wishlist) then wouldn't be safe to assume you want that storage for that metric regardless of source?, if that is the case it will simplify the implementation and will only require something like RRAs_template as shown in d and not need a, b, or c at all (or at least not as part of the first implementation). currently in data_source the polling interval is optional and so the same could be done with the template to apply in the long run, but complicating the configuration parser, for IMHO no really good reason. using a script is definitely interesting because of the flexibility it allows for, but as mentioned before a problem because of the additional forking required and also problematic because it will keep part of the logic outside gmetad. Perhaps I'm misunderstanding how using a separate script would work, but there would only be a fork storm during initial RRD creation, correct? I had assumed that the current behavior of keep existing RRD file would remain. Thus, the only time we would really have to worry about forking off hundreds/thousands of processes would be when a new cluster is created, or when the RRD files are all removed for some reason. Under normal operating circumstances, the RRD files already exist, so there's no need to run the creation script. -- Jesse Becker Every cloud has a silver lining, except for the mushroom-shaped ones, which come lined with strontium-90. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
On Mon, Jan 04, 2010 at 06:55:36AM -0500, Jesse Becker wrote: On Mon, Jan 4, 2010 at 03:46, Carlo Marcelo Arenas Belon care...@sajinet.com.pe wrote: My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. why do you want to have this? what is the use case for having different metric storage frequencies per cluster and why can't be done by having instead independent gmetad? I can think of reasons why you'd want different frequencies for the same metric, mostly having to do with required data retention policies and lack of resources (disk space). It could be done with different gmetad processes, but that gets complicated for a simple cluster (multiple gmetad polling the same gmond, the same data is displayed in two different locations). Of course I can think of reasons why that might be something you would want to have, and that is why I said that might be needed in the long run if those reasons are genuine, but I will be surprised if there is a reason good enough to do that from the very beginning when using multiple gmetad would solve it for now IMHO. The point is that the syntactical sugar to make that work would be far more complicated and difficult to do in a 3.1 compatible way than just adding templates and therefore I would tend to believe that it would make more sense as a 3.2 feature, while having different RRAs independently of which datasource has been in the wishlist since even before 3.1.0 was released and would be something you would instead want backported ASAP (probably even to 3.0 if there is demand) if you are talking about different metric storage frequencies per metric as it seems to be implied later (and which is a feature long in the wishlist) then wouldn't be safe to assume you want that storage for that metric regardless of source?, if that is the case it will simplify the implementation and will only require something like RRAs_template as shown in d and not need a, b, or c at all (or at least not as part of the first implementation). currently in data_source the polling interval is optional and so the same could be done with the template to apply in the long run, but complicating the configuration parser, for IMHO no really good reason. using a script is definitely interesting because of the flexibility it allows for, but as mentioned before a problem because of the additional forking required and also problematic because it will keep part of the logic outside gmetad. Perhaps I'm misunderstanding how using a separate script would work, but there would only be a fork storm during initial RRD creation, correct? it depends on what the script does, but that is correct in the case that the script is only returning the RRAs back to gmetad as you suggested. still the disadvantage (as mentioned above) of not being able to know from just reading the gmetad.conf which RRA apply on each case still applies and would probably imply that the best way to do this will be to make gmetad modular (just like gmond) and then allow it to write its own configuration or use one by default that could be used as a starting point just like `gmond -t` allows for. I had assumed that the current behavior of keep existing RRD file would remain. Thus, the only time we would really have to worry about forking off hundreds/thousands of processes would be when a new cluster is created, or when the RRD files are all removed for some reason. Under normal operating circumstances, the RRD files already exist, so there's no need to run the creation script. or when gmetad is restarted and have to again figure out which RRA apply on each case for the updates and unless gmetad.conf has all that information somehow in a static way (by using for example the modular solution instead of just a script). Carlo -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
I'm definitely in favor of the pattern-oriented designs, and I do agree this is really great idea. I'm more interested in changing the RRAs per-metric than per-source, so any solution that only solved the per-source part of the problem would be less good. The script or callback solution makes good sense to me, too -- but I wonder about speed. Some of our sources generate 25000 RRDfiles in the first minute of that source coming up. That's a fair bit of forking around if it's a script-only solution. -- ReC On Dec 28, 2009, at 3:49 PM, Jesse Becker wrote: A few random thoughts inline below. Regardless of the specifics, I think this is a really great idea. On Mon, Dec 28, 2009 at 17:16, Daniel Pocock dan...@pocock.com.au wrote: I'm looking at extending the gmetad.conf format, while still making sure that it can read the existing config files. There are two particular lines that interest me: RRAs RRA:AVERAGE:0.5:1:244 RRA:AVERAGE:0.5:24:244 RRA:AVERAGE:0.5:168:244 RRA:AVERAGE:0.5:672:244 \ RRA:AVERAGE:0.5:5760:374 data_source my cluster [polling interval] address1:port addreses2:port ... My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. A couple of ideas I've had: a) allow the RRAs line to be repeated - the most recent version will be applied to all subsequent data sources This is a good baseline rule: Last match wins. b) a new option, RRAs_for_source, e.g. RRAs_for_source my cluster RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 How do you handle cases where my cluster isn't defined? Is that a fatal error, or a warning? Does the last (valid) match still win? c) define templates: RRAs_template tmpl1 RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 Can templates also be redefined? If there are templates , there has to be a precedence for what matches are valid. I'd propose that last match wins, based on this heirarchy: * default definition * data_source * metric name * filename (is this *always* 100% derivable from the metric name?) * script My slight concern about templates is that they could make the configuration very complicated and cause unexpected behavior. On the other hand, having to define and redefine rules for every single metric that needs a custom RRA, multiple times for each data_source is just ugly. :) d) regex based filename matching: RRAs_template tmpl1 RRA:MAX:0.5:100:240 RRA:MAX:0.5:240:30 RRAs_template_match tmpl1 .*/cpu_num.rrd .*/swap_total.rrd I like this. However, it might be simpler to just limit the match to the metric name, instead of having to deal with the preceeding directory path, and trailing .rrd string. e) call an external script for RRD creation, and pass it the source name, metric name, etc: RRD_create_script /usr/local/bin/create-my-rrd I like this, since it is inherently the most flexible, but opens us to several security issues. I would suggest that the script return definition of the RRA, as opposed to creating the .rrd files directly. If this option is implemented, a basic and officially supported script should be included. -- Jesse Becker Every cloud has a silver lining, except for the mushroom-shaped ones, which come lined with strontium-90. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Extending the format of gmetad.conf
I'm looking at extending the gmetad.conf format, while still making sure that it can read the existing config files. There are two particular lines that interest me: RRAs RRA:AVERAGE:0.5:1:244 RRA:AVERAGE:0.5:24:244 RRA:AVERAGE:0.5:168:244 RRA:AVERAGE:0.5:672:244 \ RRA:AVERAGE:0.5:5760:374 data_source my cluster [polling interval] address1:port addreses2:port ... My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. A couple of ideas I've had: a) allow the RRAs line to be repeated - the most recent version will be applied to all subsequent data sources b) a new option, RRAs_for_source, e.g. RRAs_for_source my cluster RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 c) define templates: RRAs_template tmpl1 RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 and invoke the template `tmpl1' using an @ or some other special symbol, which is optional: data_source my cluster [...@tmpl1] [polling interval] address1:port addreses2:port ... or data_source my cluster [...@rras=tmpl1] [polling interval] address1:port addreses2:port ... d) regex based filename matching: RRAs_template tmpl1 RRA:MAX:0.5:100:240 RRA:MAX:0.5:240:30 RRAs_template_match tmpl1 .*/cpu_num.rrd .*/swap_total.rrd e) call an external script for RRD creation, and pass it the source name, metric name, etc: RRD_create_script /usr/local/bin/create-my-rrd Any preferences for this? Anyone facing a similar requirement? -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Extending the format of gmetad.conf
A few random thoughts inline below. Regardless of the specifics, I think this is a really great idea. On Mon, Dec 28, 2009 at 17:16, Daniel Pocock dan...@pocock.com.au wrote: I'm looking at extending the gmetad.conf format, while still making sure that it can read the existing config files. There are two particular lines that interest me: RRAs RRA:AVERAGE:0.5:1:244 RRA:AVERAGE:0.5:24:244 RRA:AVERAGE:0.5:168:244 RRA:AVERAGE:0.5:672:244 \ RRA:AVERAGE:0.5:5760:374 data_source my cluster [polling interval] address1:port addreses2:port ... My goal is to allow different sets of RRAs for different sources, while making sure the existing file format remains valid. A couple of ideas I've had: a) allow the RRAs line to be repeated - the most recent version will be applied to all subsequent data sources This is a good baseline rule: Last match wins. b) a new option, RRAs_for_source, e.g. RRAs_for_source my cluster RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 How do you handle cases where my cluster isn't defined? Is that a fatal error, or a warning? Does the last (valid) match still win? c) define templates: RRAs_template tmpl1 RRA:MAX:0.5:1:244 RRA:MAX:0.5:24:244 Can templates also be redefined? If there are templates , there has to be a precedence for what matches are valid. I'd propose that last match wins, based on this heirarchy: * default definition * data_source * metric name * filename (is this *always* 100% derivable from the metric name?) * script My slight concern about templates is that they could make the configuration very complicated and cause unexpected behavior. On the other hand, having to define and redefine rules for every single metric that needs a custom RRA, multiple times for each data_source is just ugly. :) d) regex based filename matching: RRAs_template tmpl1 RRA:MAX:0.5:100:240 RRA:MAX:0.5:240:30 RRAs_template_match tmpl1 .*/cpu_num.rrd .*/swap_total.rrd I like this. However, it might be simpler to just limit the match to the metric name, instead of having to deal with the preceeding directory path, and trailing .rrd string. e) call an external script for RRD creation, and pass it the source name, metric name, etc: RRD_create_script /usr/local/bin/create-my-rrd I like this, since it is inherently the most flexible, but opens us to several security issues. I would suggest that the script return definition of the RRA, as opposed to creating the .rrd files directly. If this option is implemented, a basic and officially supported script should be included. -- Jesse Becker Every cloud has a silver lining, except for the mushroom-shaped ones, which come lined with strontium-90. -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general