On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan <vl...@vuksan.com> wrote: > If you lose a day or > two or even a week of trending data that is not gonna be disaster as long > as that data is present somewhere else.
sure, but where? how would the ganglia frontend tell? > Thus I proposed a simple solution > where even if one of the gmetads (gmetad1) fails you can either > > a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1 which unless you have small amount or data or fast network between the two nodes won't complete before the next write is initiated, meaning they won't be identical. > b. Simply start up gmetad1 and don't worry about the lost data sure > As far as which data is going to be displayed you can do either > > 1. Proxy traffic to Ganglia with most up to date data how do you tell which one has most up to date data? > 2. Change DNS record to point to Ganglia with most up to date data same question, which one has most up to date data? if you really mean "most recent" then both would, because both would have fetched the last reading assuming they are both functional, but gmetad1 would have a hole in its graphs. To me that does not really count as up to date. Up to date would be the one with the most complete data set which you have no way to identify programmatically. Also, assume now gmetad2 fails and both have holes, which one is the most up to date? > To your last point there are chances that both gmetads fail in quick > succession however I would think that would be a highly unlikely event. it doesn't have to be in quick succession to find yourself in a condition where you have holes in your data and no way to go back, which is my main point: as much as you can say that no data loss requirements aren't really a major concern for most people the fact remains that with the current codebase you can't avoid that situation, which imho isn't right. > If you had requirements for such flawless performance you should be able to > invest resources to resolve it. I'm sorry, but I don't see it. Even with plenty resources you'd have to either put some heavy restrictions in place like centralized data on a SAN, which is not really something you'd want in a distributed setup, or add plenty hacks to, say for example, replay the content of rrds to some other place, but even in this case it's pretty quirky. > Makes sense ? I guess it does if I look at it from your perspective which if I understood it correctly implies that: * some data loss doesn't matter * manual interaction to fix things is ok But that isn't my perspective. Scalable (distributed) applications should be able to guarantee by design no data loss in as many cases as possible and not force you to centralized designs or hackery in order to do so. There are ways to make this possible without changes to the current gmetad code by adding a helper webservice that proxies the access to rrd. This way it's perfectly fine to have different locations with different data and the webservice will take care of interrogating one or more gmetads/backends to retrieve the full set and present it to the user. Fully distributed, no data loss. This could be of course built into gmetad by making something like port 8652 access the rrds, but to me that's the wrong path, makes gmetad's code more complicated and it's potentially a functionality that has nothing to do with ganglia and is backend dependent. thoughts? -- "Behind every great man there's a great backpack" - B. ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Ganglia-developers mailing list Gangliaemail@example.com https://lists.sourceforge.net/lists/listinfo/ganglia-developers