Spike Spiegel wrote:
> On Wed, Nov 25, 2009 at 4:20 PM, Daniel Pocock <dan...@pocock.com.au> wrote:
>   
>> One problem I've been wondering about recently is the scalability of
>> gmetad/rrdtool.
>>     
>
> [cut]
>
>   
>> In a particularly large organisation, moving around the RRD files as
>> clusters grow could become quite a chore.  Is anyone putting their RRD
>> files on shared storage and/or making other arrangements to load balance
>> between multiple gmetad servers, either for efficiency or fault tolerance?
>>     
>
> We do. We run 8 gmetad servers, 2 in each colo x 3 colos + 2 centrals
> and rrds are stored in ram disk on each node. Nodes are setup with
> unicast and data is sent to both heads in the same colo for fault
> tolerance/redundancy. This is all good until you have a gmetad failure
> or need to perform maintenance on one of the nodes because at that
> point as data stops flowing in you will have to rsync back once you're
> done from the other head and it doesn't matter how you do it (live
> rsync or stop the other head during the sync process) you will lose
> data. That said it could be easily argued that you have no guarantee
> that both heads have the same data to start with because messages are
> udp and there's no guarantee either node will have not lost some data
> the other hasn't. Of course there is a noticeable difference between a
> random message loss and a say 15 window blackout during maintenance,
> but then if your partitions are small enough a live rsync could
> possibly incur in a small enough loss... it really depends.
>   
Thanks for sharing this - could you comment on the total number of RRDs 
per gmetad, and do you use rrdcached?
> As to share storage we haven't tried but my personal experience is
> that given how a local filesystem can't manage that many small writes
> and seeks using any kind of remote FS isn't going to work.
>   
I was thinking about gmetads attached to the same SAN, not a remote FS 
over IP.  In a SAN, each gmetad has a physical path to the disk (over 
fibre channel) and there are some filesystems (e.g. GFS) and locking 
systems (DLM) that would allow concurrent access to the raw devices.  If 
two gmetads mount the filesystem concurrently, you could tell one gmetad 
`stop monitoring cluster A, sync the RRDs' and then tell the other 
gmetad to start monitoring cluster A.

DLM is quite a heavyweight locking system (cluster manager and heartbeat 
system required), some enterprises have solutions like Apache Zookeeper 
(Google has one called Chubby) and they can potentially allow the gmetad 
servers to agree on who is polling each cluster.
> I see two possible solutions:
> 1. client caching
> 2. built-in sync feature
>
> In 1. gmond would cache data locally if it could not contact the
> remote end. This imho is the best solution because it helps not only
> with head failures and maintenance, but possibly addresses a whole
> bunch of other failure modes too.
>   
The problem with that is that the XML is just a snapshot.  Maybe the XML 
could contain multiple values for each metric, e.g. all values since the 
last poll?  There would need to be some way of limiting memory usage 
too, so that an agent doesn't kill the machine if nothing is polling it.
> 2. instead would make gmetad aware of when it got data last and be
> able to ask another gmetad for its missing data and keep fetching
> until the delta (data loss) is small enough (user configured) that it
> can again receive data from clients. This is probably harder to
> implement and still would not guarantee no data loss, but I don't
> think that's a goal. The interesting property of this approach is that
> it'd open the door for realtime merge of data from multiple gmetads so
> that as long that at least one node has received a message a client
> wouldn't ever see a gap effectively providing no data loss. I'm toying
> with this solution in a personal non-ganglia related project as it's
> applicable to anything with data stored in rrd over multiple
> locations.
>   
This would be addressed by the use of SAN - there would only be one RRD 
file, and the gmetad servers would need to be in some agreement so that 
they both don't try to write the same file at the same time.



------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to