Hi all, Attached/linked diagram [1] shows how the GangliaResourceMonitorFactory will be integrated to AssignmentMonitor to calculate load. In here in AssignmentMonitor it keeps the node's load in a static hashmap (<nodeId, load>) so I guess the *loadMap should be updated in a timely manner* (ex: 1 min interval) by parsing the ganglia XML right?
Since the load we need is not a traditional value and it's a value which says how many of these jobs can fit on a machine. So as I understood, the load calculation should happen a way that, which the most relevant metrics are taken into calculation and weights should be added to the values. then the load value should normalize within the range of 0 and 1. I guess following metrics are the most relevant ones with the default Ganglia metrics for the calculation. load_one = one minute load average load_five = five minutes load average load_fifteen = fifteen minutes load average mem_free = amount of available memory swap_free = amount of available swap memory Followings are the models currently have in mind. (I). weight the 1 min, 5 min and 15 min load numbers and normalize the value. (II). adding the mem_free and swap_free metrics to the calculation with model I. more weight should goes to either 5 or 15. according to [3]. #1. *but how can I rationalize the weights i give?* #2. furthermore what is the capacity of a Node? since we are talking about *normalization what is the role of this capacity?* how it affects this calculation. (when assigning load to a particular node it calculate something like "if (loadValue <= (loadCap - curLoad))" inhere loadCap = node.getCapacity() and curLoad=loadMap.get(node.getNodeId())).intValue() ) Other considerations #3. what should be the value if the node is offline? We can say a particular Node is offline by TN and TMAX value. gmetad, a host is considered offline and is ignored if TN > 4 * TMAX.[2] (TN : TN value is the number of seconds since the metric was last updated TMAX: The maximum time in seconds between gmetric calls) *default ganglia metrics is listed here and your thoughts are welcome.* disk_free = Disk Space Available machine_type = System architecture bytes_out = Number of bytes out per second gexec = DESC VAL = gexec available proc_total = Total number of processes cpu_nice = Percentage of CPU utilization that occurred while executing at the user level with nice priority pkts_in = Packets in per second cpu_speed = CPU Speed in terms of MHz boottime = The last time that the system was started cpu_wio = Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request os_name = Operating system name load_one = One minute load average os_release = Operating system release date disk_total = Total available disk space cpu_user = Percentage of CPU utilization that occurred while executing at the user level cpu_idle = Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request swap_free = Amount of available swap memory mem_cached = Amount of cached memory pkts_out = Packets out per second load_five = Five minute load average cpu_num = Total number of CPUs load_fifteen = Fifteen minute load average mem_free = Amount of available memory cpu_system = Percentage of CPU utilization that occurred while executing at the system level proc_run = Total number of running processes mem_total = Total amount of memory displayed in KBs cpu_aidle = Percent of time since boot idle CPU bytes_in = Number of bytes in per second mem_buffers = Amount of buffered memory mem_shared = Amount of shared memory swap_total = Total amount of swap space displayed in KBs part_max_used = Maximum percent used for all partitions [1] https://issues.apache.org/jira/secure/attachment/12589911/diagram1.png [2] http://entropy.gforge.inria.fr/ganglia.html [3] http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages Cheers, Rajith On Fri, Jun 21, 2013 at 7:22 PM, Rajith Siriwardana < rajithsiriward...@gmail.com> wrote: > > moving the conversation to dev. > > Cheers, > Rajith > > On Thu, Jun 20, 2013 at 11:10 AM, Chris Mattmann <chris.mattm...@gmail.com > > wrote: > >> Hi Rajith, >> >> RE: #1 yep that's the next step. >> >> RE: #2, I would create a pluggable function/class that allows >> different "Besting" algorithms to be plugged in. One simple one >> would be AverageLoad (avg between the 3 load values). Another >> simple would be FiveMinuteLoad; another OneMinLoad; etc. I would >> also imagine allowing ArbitraryMetricWeightedAvgLoad where it takes >> in maybe a List<String> specifying the metric names, and then also >> maybe a HashMap<String, Double> that identifies the metric name, >> and then the weight to apply in the weighted average, e.g., maybe >> {{"1minload", "3.0"}, {"5minload", "10.0"}, {"15minload", "1.0"}} >> >> indicating that the final load should be calculated as: >> >> 3*[val of 1minLoad] + 10*[val of 5minLoad] + 1*[val of 15minLoad] >> ----------------------------------------------------------------- >> 3 >> Or something like the above >> >> for #3 (use casting and maybe Math.max)? >> >> for #4, see above. >> >> Also this should all probably go on dev@oodt.apache.org so can >> you move the conversation there? >> >> Cheers, >> Chris >> >> ------------------------ >> Chris Mattmann >> chris.mattm...@gmail.com >> >> >> >> >> -----Original Message----- >> From: Rajith Siriwardana <rajithsiriward...@gmail.com> >> Date: Wednesday, June 19, 2013 11:32 AM >> To: jpluser <chris.a.mattm...@jpl.nasa.gov>, jpluser >> <chris.mattm...@gmail.com> >> Subject: [Ganglia plugin] Next steps >> >> >Hi Chris, >> >My next steps would be >> > >> >Adding the capability of creating a GangliaAssignmentMonitor from the >> >GangliaAssignmentMonitorFactory to AssignmentMonitor. >> > >> >in that case I have few questions, >> > >> >1. GangliaAssignmentMonitor should get the XML downloaded and parsed when >> >the AssignmentMonitor requests about the nodes current load right? and >> >this should update the loadMap in AssignmentMonitor? >> > >> >2. About the current load, what it should be 15 mins ? 5 mins ? 1 min ? >> >or should it be an average load. (since the requirement is the current >> >load, i guess this should be a weighted average of these three load >> >values) >> > >> >3. Ganglia provides the load values as percentage values. loadMap uses >> >Integer, how the mapping should happen? >> > >> >4. I couldn't find anywhere which require any metric other than the load >> >of a resource node. >> > >> > >> > >> >Thank you, >> >Rajith >> > >> >> >> >