doh, forgot to reply-all
On Fri, Mar 21, 2008 at 1:38 AM, Javier Frias <[EMAIL PROTECTED]> wrote: > On Wed, Mar 19, 2008 at 1:38 AM, Robin H. Johnson <[EMAIL PROTECTED]> wrote: > > On Tue, Mar 18, 2008 at 08:44:13PM -0400, Javier Frias wrote: > > > So i guess searching the lists *well* should have been my first > recourse... > > > > > > http://lists.danga.com/pipermail/mogilefs/2007-June/001043.html > > > > > > mentions that the stats command is pretty heavy.. I'd assume this is > > > still the case.. so... the follow up question would be, > > > > > > how does everyone here monitor mogilefs? > > > > > > I have db monitors, and port monitors, and a simple transaction > > > monitor is in the works ( write a file, delete a file ), but it'd be > > > nice to map growth across domains/classes. > > I used to graph this in Munin (run the query every 5 minutes, store in > > RRD): > > SELECT COUNT(fid), dmid, classid FROM file GROUP BY dmid, classid; > > (which the dmid/classid cached). > > So all domain/classes on the same graph. > > I stopped doing it as due to the differences in magnitude of numbers and > > growth rate, I needed to have multiple separate graphs, and I just > > couldn't be bothered. > > Yeah, this is a similar query as i'm trying to implement, the script > running it would computer the difference from the last run, as well as > the time frame, so as to compute both the counts ( a worthwile yet not > really work draphing statistic, more just a thing you check ), but > mostly what the growth rate is, as this I can use to plan out when in > the future i will need more storage nodes. > > Another thing worth graphing is the replication stats, ie, files in > replication queue, etc etc. > > > > > > > Instead, I do use the per-database graphs that Munin has for Postgresql. > > postgres_block_read_ > > postgres_commits_ > > postgres_queries_ > > postgres_space_ > > yeah, i'm graphing these statis in mysql as well. > > > > > > This actually raises one interesting bit that I'm not sure if anybody > > else has seen. Approximately once a day, Mogile is doing a SELECT query > > that returns a massive 50-80k rows, while the normal 5-minute average is > > ~500. > > hadn't noticed, but i do have spikes in my usage, so i attributed > those to my traffic spikes. > > > > > > Beyond that performance monitoring of PostGres, I do have a lot of stuff > > watched via Nagios - daemon running and port connection tests for > > all the mogilefsd nodes (3), and all the mogstored nodes (8), and the > > haproxy nodes (local on each web client). > > same here. > > > > > > *haproxy: None of the MogileFS client code does load-balancing/failover > > between MogileFS instances very well, so we use haproxy on > > loopback on each of our web nodes. If you want to just contact the > > Mogile system, instead of looking for a mogilefsd instance that is up, > > you just hit localhost:7001, and it directs you to one that IS actually > > up. haproxy keeps state of ones that are up, so it works well. Doing it > > on loopback cuts down on any latency and failure issue we might have if > > we were to have it on a standalone system. > > I handle it at the client level. Aside from mogstored, we run an > image transformation proxy on each storage node, that can > upscale/downscale images. So when my client requests a file from > mogile, i do a get_paths, get the path, and then translate the port to > my image transformation proxy, if this fails, i try the next path, etc > etc. Since files are distributed, i woundlt be able to use my load > balancer on the storage daemons, since potentially, a file is not > guaranteed to be on the same path/device as in all the servers. > > > > > > > > Thx for the reply, its cool to know what others are doing to monitor mogile. > > > -- > > > > Robin Hugh Johnson > > Gentoo Linux Developer & Infra Guy > > E-Mail : [EMAIL PROTECTED] > > GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 > > >
