On Sun, Dec 20, 2009 at 7:35 PM, Vladimir Vuksan <vl...@vuksan.com> wrote:
> If you lose a day or
> two or even a week of trending data that is not gonna be disaster as long
> as that data is present somewhere else.

sure, but where? how would the ganglia frontend tell?

> Thus I proposed a simple solution
> where even if one of the gmetads (gmetad1) fails you can either
> a. Get all the rrds (rsync) from gmetad2 before you restart gmetad1

which unless you have small amount or data or fast network between the
two nodes won't complete before the next write is initiated, meaning
they won't be identical.

> b. Simply start up gmetad1 and don't worry about the lost data


> As far as which data is going to be displayed you can do either
> 1. Proxy traffic to Ganglia with most up to date data

how do you tell which one has most up to date data?

> 2. Change DNS record to point to Ganglia with most up to date data

same question, which one has most up to date data?

if you really mean "most recent" then both would, because both would
have fetched the last reading assuming they are both functional, but
gmetad1 would have a hole in its graphs. To me that does not really
count as up to date. Up to date would be the one with the most
complete data set which you have no way to identify programmatically.

Also, assume now gmetad2 fails and both have holes, which one is the
most up to date?

> To your last point there are chances that both gmetads fail in quick
> succession however I would think that would be a highly unlikely event.

it doesn't have to be in quick succession to find yourself in a
condition where you have holes in your data and no way to go back,
which is my main point: as much as you can say that no data loss
requirements aren't really a major concern for most people the fact
remains that with the current codebase you can't avoid that situation,
which imho isn't right.

> If you had requirements for such flawless performance you should be able to
> invest resources to resolve it.

I'm sorry, but I don't see it. Even with plenty resources you'd have
to either put some heavy restrictions in place like centralized data
on a SAN, which is not really something you'd want in a distributed
setup, or add plenty hacks to, say for example, replay the content of
rrds to some other place, but even in this case it's pretty quirky.

> Makes sense ?

I guess it does if I look at it from your perspective which if I
understood it correctly implies that:
* some data loss doesn't matter
* manual interaction to fix things is ok

But that isn't my perspective. Scalable (distributed) applications
should be able to guarantee by design no data loss in as many cases as
possible and not force you to centralized designs or hackery in order
to do so.

There are ways to make this possible without changes to the current
gmetad code by adding a helper webservice that proxies the access to
rrd. This way it's perfectly fine to have different locations with
different data and the webservice will take care of interrogating one
or more gmetads/backends to retrieve the full set and present it to
the user. Fully distributed, no data loss. This could be of course
built into gmetad by making something like port 8652 access the rrds,
but to me that's the wrong path, makes gmetad's code more complicated
and it's potentially a functionality that has nothing to do with
ganglia and is backend dependent.


"Behind every great man there's a great backpack" - B.

This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
Ganglia-developers mailing list

Reply via email to