Re: Monitoring with Cacti
Cloudkick does monitor JMX now. That + custom alerts is pretty powerful. I work for Cloudkick, btw On Sun, Sep 12, 2010 at 9:39 AM, Dave Viner davevi...@pobox.com wrote: I haven't tried cacti, but I'm using CloudKick as an external service for monitoring Cassandra. It's super easy to get setup. Happy to share my setup if that'd help. It doesn't currently monitor JMX information, but it does offer some basic checks like thread pool and column family stats - https://support.cloudkick.com/Cassandra_Checks. Dave Viner On Fri, Sep 10, 2010 at 8:31 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Fri, Sep 10, 2010 at 7:29 PM, aaron morton aa...@thelastpickle.com wrote: Am going through the rather painful process of trying to monitor cassandra using Cacti (it's what we use at work). At the moment it feels like a losing battle :) Does anyone know of some cacti resources for monitoring the JVM or Cassandra metrics other than... mysql-cacti-templates http://code.google.com/p/mysql-cacti-templates/ - provides templates and data sources that require ssh and can monitor JVM heap and a few things. Cassandra-cacti-m6 http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp Coded for version 0.6* , have made some changes to stop it looking for stats that no longer exist. Missing some metrics I think but it's probably the best bet so far. If I get it working I'll contribute it back to them. Most of the problems were probably down the how much effort it takes to setup cacti. jmxterm http://www.cyclopsgroup.org/projects/jmxterm/ Allows for command line access to JMX. I started down the path of writing a cacti data source to use this just to see how it worked. Looks like a lot of work. Thanks for any advice. Aaron Setting up cacti is easy, the second time, and third time :) As for cassandra-cacti-m6 (i am the author). Unfortunately, I have been fighting the jmx switcharo battle for about 3 years now hadoop/hbase/cassandra/hornetq/vserver In a nutshell there is ALWAYS work involved. First, is because as you noticed attributes change/remove/add/renamed. Second it takes a human to logically group things together. For example, if you have two items cache hits and cache misses. You really do not want two separate graphs that will scale independently. You want one slick stack graph, with nice colors, and you want a CDEF to calculate the cache hit percentage by dividing one into the other and show that at the bottom. If you want to have a 7.0 branch to cassandra-cacti-m6 I would love the help. We are not on 7.0 yet so I have not had the time just to go out and make graphs for a version we are not using yet :) but if you come up with patches they are happily accepted. Edward -- Dan Di Spaltro
Re: TechCrunch article on Twitter and Cassandra
This sounds more like high-throughput external analytics, aka they will know all the queries consumers will use. This isn't for internal analytics. On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia martygree...@gmail.com wrote: It almost seems counter-intuitive. For analytics, you'd think they'd want a database that supports more sophisticated query functionality (sql). Whereas for everyday tweet storage, something fast and high-throughput (cassandra) makes sense. I'd be curious to here the details as well. On Sat, Jul 10, 2010 at 10:25 AM, S Ahmed sahmed1...@gmail.com wrote: Nice link. From what I understood, they are not using it to store tweets but rather will use mysql? I wish they went into more detail as to why... On Sat, Jul 10, 2010 at 1:25 AM, Kochheiser,Todd W - TOK-DITT-1 twkochhei...@bpa.gov wrote: A good read. http://techcrunch.com/2010/07/09/twitter-analytics-mysql/ Todd -- Dan Di Spaltro
Re: Time-series data model
This is actually fairly similar to how we store metrics at Cloudkick. Below has a much more in depth explanation of some of that https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ So we store each natural point in the NumericArchive table. ColumnFamily CompareWith=LongType Name=NumericArchive / ColumnFamily CompareWith=LongType Name=Rollup5m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup20m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup30m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup60m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup4h ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup12h ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup1d ColumnType=Super CompareSubcolumnsWith=BytesType / our keys look like: serviceuuid.metric-name Anyways, this has been working out very well for us. 2010/4/15 Ted Zlatanov t...@lifelogs.com: On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin ja...@ractive.ch wrote: JB Am 14.04.2010 15:22, schrieb Ted Zlatanov: On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergaminja...@ractive.ch wrote: JB The metrics are stored together with a timestamp. The queries we want to JB perform are: JB * The last value of a specific metric of a device JB * The values of a specific metric of a device between two timestamps t1 and JB t2 Make your key devicename-metricname-MMDD-HHMM (with whatever time sharding makes sense to you; I use UTC by-hours and by-day in my environment). Then your supercolumn is the collection time as a LongType and your columns inside the supercolumn can express the metric in detail (collector agent, detailed breakdown, etc.). JB Just for my understanding. What is time sharding? I couldn't find an JB explanation somewhere. Do you mean that the time-series data is rolled JB up in 5 minues, 1 hour, 1 day etc. slices? Yes. The usual meaning of shard in RDBMS world is to segment your database by some criteria, e.g. US vs. Europe in Amazon AWS because their data centers are laid out so. I was taking a linguistic shortcut to mean break down your rows by some convenient criteria. You can actually set up your Partitioner in Cassandra to literally shard your keyspace rows based on the key, but I just meant slice in my note. JB So this would be defined as: JB ColumnFamily Name=measurements ColumnType=Super JB CompareWith=UTF8Type CompareSubcolumnsWith=LongType / JB So when i want to read all values of one metric between two timestamps JB t0 and t1, I'd have to read the supercolumns that match a key range JB (device1:metric1:t0 - device1:metric1:t1) and then all the JB supercolumns for this key? Yes. This is a single multiget if you can construct the key range explicitly. Cassandra loads a lot of this in memory already and filters it after the fact, that's why it pays to slice your keys and to stitch them together on the client side if you have to go across a time boundary. You'll also get better key load balancing with deeper slicing if you use the randomizing partitioner. In the result set, you'll get each matching supercolumn with all the columns inside it. You may have to page through supercolumns. Ted -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
But I didn't restart the red one. On Thu, Apr 1, 2010 at 12:18 PM, Jonathan Ellis jbel...@gmail.com wrote: There shouldn't be anything to clean up. (The temporary streaming files it anticompacted are automatically removed on restart) On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Okay, so should I run any more commands like cleanup before? On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis jbel...@gmail.com wrote: Bootstrap source restarting will always fail bootstrap. You'll need to restart the blue one too now, I'm afraid. On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has 0 in STREAM-STAGE. On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Red one. Gary - both say nothing is happening with no destinations or sources. On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis jbel...@gmail.com wrote: which node rebooted, the red one, or the blue one? On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: So we are adding another node to the cluster with the latest 0.6 branch (RC1). It seems to be hung in some limbo state. Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4 machines, with RF=3. One machine had more load than the others, and sure enough bootstrapping selected that node. That is the red machine. The light blue machine is the new machine. I have attached a graph to illustrate when the bootstrap process started. In jconsole the streamingservice status was performing anticompaction... for over 18-24 hrs. It is currently in nothing is happening. It did have 1 active STREAM-STAGE task, but the machine had to be rebooted for something unrelated to cassandra. Now the light blue machine appears to be getting data, but its growing at virtually the same rate as the other machines which makes me think it is part of the cluster and not actually streaming data from the machine its supposed to. Any other ideas on how to debug? -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
Sorry I meant the red one restarted about a day ago. The graph shows the dip in disk space. But it no where near returned to the previous amount of disk usage. I was referring to how the red one didn't reclaim all its space (I figure about 60gb actually belong on that machine) Is that normal (its currently taking up about 100gb)? 2 minutes ago, I restarted the blue one. Now the streamservice task is performing anti-compaction on the red one. On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro dan.dispal...@gmail.com wrote: But I didn't restart the red one. On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro dan.dispal...@gmail.com wrote: Red one. On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis jbel...@gmail.com wrote: which node rebooted, the red one, or the blue one? I'm confused. -- Dan Di Spaltro
Re: Hackathon?!?
Yeah, 22nd seems like its as good as its going to get. Ill bring you b-day present =) Best, On Mon, Mar 22, 2010 at 2:20 PM, Chris Goffinet goffi...@digg.com wrote: Dan Did we all agree April 22 works for all? -Chris On Mon, Mar 22, 2010 at 2:13 PM, Dan Di Spaltro dan.dispal...@gmail.comwrote: Great - Chris, you still going to put together the invite? On Thu, Mar 11, 2010 at 5:36 AM, Jonathan Ellis jbel...@gmail.comwrote: Ack, I agreed to speak at http://nosqleu.com/, I never did hear a final date but they put up a schedule online (april 20-22). But, 22 probably is a better date, and Eric and Stu are fully capable of representing rackspace without me. :) -Jonathan On Wed, Mar 10, 2010 at 10:50 PM, Chris Goffinet goffi...@digg.com wrote: We could do it on April 22 (1 week later), that's my birthday :-) What better way to celebrate haha. -Chris On Mar 10, 2010, at 9:58 AM, Jonathan Ellis wrote: I'm in either way, but if we push it a week later then the twitter guys could (a) make it and (b) pimp it at their own conference. On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote: Ah, hell. Thought this was the first day. Can't make it. -- Jeff On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote: I'm already committed to talking about cassandra that day at our company's developer conference (chirp.twitter.com). -ryan On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote: I'm down. -- Jeff ... -- Dan Di Spaltro -- Chris Goffinet -- Dan Di Spaltro