Re: 0.6 to 0.7 upgrade assumptions

2010-11-02 Thread Edward Capriolo
On Tue, Nov 2, 2010 at 2:49 PM, Gary Dusbabek gdusba...@gmail.com wrote: You'll need to convert from storage-conf.xml to cassandra.yaml and import your schema at some point.  NEWS.txt outlines the general approach (see Upgrading). Gary. On Tue, Nov 2, 2010 at 13:31, Erik Onnen

Re: Adding nodes in new data center

2010-11-01 Thread Edward Capriolo
On Mon, Nov 1, 2010 at 6:01 PM, Henry Luo h...@choicestream.com wrote: We have a cluster running in one data center, and are adding some in a new data center. There are some data already in the current cluster. We did something wrong at first by not having AutoBootstrap on, when we saw no

Re: Best practice for adding new nodes to ring

2010-10-26 Thread Edward Capriolo
On Tue, Oct 26, 2010 at 1:45 PM, Stu Hood stu.h...@rackspace.com wrote: While the adding virtual tokens/nodes to Cassandra discussion is a good one, there are a few factors that might delay (or remove?) the necessity of adding that complexity: * In Cassandra 0.7, removing load from a node

Re: Experiences with Cassandra hardware planning

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 11:21 AM, Eric Rosenberry e...@rosenberry.org wrote: Hey Chris- That is tough to say as we started out with no data and have been continuously loading data into the cluster.  Initially we had less data than the amount of RAM in each node (48 gigs) but we have eventually

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 12:37 PM, Jonathan Ellis jbel...@gmail.com wrote: On Sun, Oct 24, 2010 at 9:09 PM, Takayuki Tsunakawa tsunakawa.ta...@jp.fujitsu.com wrote: From: Jonathan Ellis jbel...@gmail.com (b) Cassandra generates input splits from the sampling of keys each node has in memory.  

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 10:19 PM, Takayuki Tsunakawa tsunakawa.ta...@jp.fujitsu.com wrote: Hello, Mike, Thank you for your advice. I'll close this thread with this mail (I've been afraid I was interrupting the community developers with cloudy questions.) I'm happy to know that any clearly

Re: keys_cached percent?

2010-10-25 Thread Edward Capriolo
On Mon, Oct 25, 2010 at 9:43 PM, Damick, Jeffrey jeffrey.dam...@neustar.biz wrote: Sure - so percents aren’t supported anymore in 0.7.x, which is fine, I just wanted to clarify. thanks On 10/25/10 9:31 PM, Aaron Morton aa...@thelastpickle.com wrote: To cache 100% set the value to 1. The

Re: Client listener for Cassandra Column Family Updates

2010-10-20 Thread Edward Capriolo
On Wed, Oct 20, 2010 at 11:15 AM, Potter,Lorne [Wpg] lorne.pot...@ec.gc.ca wrote: I have a 0.7beta2 four node cluster set up and a small Java program that writes to a TimeUUIDType sorted column family and another program that polls the database every x msecs to read the latest data. Instead of

Re: memtable sstable questions (0.6.4)

2010-10-20 Thread Edward Capriolo
On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser cassu...@gmail.com wrote: Hey, As I understand it writes go directly to the commit log.  Once a threshold has been reached the data is shipped to a memtable, and again to an sstable. 1. How many memtables are created when a flush happens

Re: Wide rows or tons of rows?

2010-10-11 Thread Edward Capriolo
2010/10/11 Héctor Izquierdo Seliva izquie...@strands.com: Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read

Re: Multi Data Center Strategy

2010-10-11 Thread Edward Capriolo
On Mon, Oct 11, 2010 at 9:53 AM, Henry Luo h...@choicestream.com wrote: We have an application that does a lot of updates to the rows. We use replication factor of 3 and are moving to multiple data centers. We would like to accomplish the following setup: Data are replicated to other data

Re: Schema question

2010-10-03 Thread Edward Capriolo
On Sun, Oct 3, 2010 at 11:02 AM, Simon Reavely simon.reav...@gmail.com wrote: Two questions: 1. So this compaction challenge is a CPU issue or a disk IO issue in your case? 2. In other places people have recommended adjustments from the defaults to control compaction overhead...did you

Re: UnavailableException when data grows

2010-09-30 Thread Edward Capriolo
After nodetool move you have to run nodetool cleanup. On Thu, Sep 30, 2010 at 3:45 PM, Rana Aich aichr...@gmail.com wrote: I have arranged my initial tokens and get this result: Address       Status     Load          Range      Ring 17014118346046923173168730371588000 192.168.202.1 Up  

Re: High number of DigestMismatchException

2010-09-26 Thread Edward Capriolo
On Sun, Sep 26, 2010 at 5:33 PM, Jonathan Ellis jbel...@gmail.com wrote: It may be an indication of a lower-level problem in your cluster, e.g. flakey network causing FD false positives causing writes to be initially replicated to less than all 3 nodes. On Sun, Sep 26, 2010 at 11:53 AM,

Re: Cassandra performance

2010-09-20 Thread Edward Capriolo
On Sat, Sep 18, 2010 at 9:26 AM, Peter Schuller peter.schul...@infidyne.com wrote:  - performance (it should be not as much less than shard of MySQL and scale linearly, we want to have not more that 10K inserts per second of writes, and probably not more than 1K/s reads which will be mostly

Re: what are ways to keep the SSTable Count down low

2010-09-20 Thread Edward Capriolo
On Mon, Sep 20, 2010 at 3:14 PM, Dathan Pattishall datha...@gmail.com wrote: How do you set the compaction threshold from storage-conf.xml? is this possible? What is the consensus on a basic Key-Value store of setting the compactionthreshold min/max from ./nodetool --host=localhost

Re: Migration from 6.X to 7.X

2010-09-07 Thread Edward Capriolo
On Mon, Sep 6, 2010 at 8:39 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Sep 6, 2010 at 4:04 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I was not aware of that. Also is the default for 6.o non framed and 7.o framed? Yes. I was thinking possibly replace cassanda.client detect

Re: drop/recreate column family race condition

2010-09-07 Thread Edward Capriolo
On Tue, Sep 7, 2010 at 5:10 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Sep 7, 2010 at 3:55 PM, B. Todd Burruss bburr...@real.com wrote: using 0.7 latest from trunk as of few minutes ago.  1 client, 1 node i have the scenario where i want to drop a column family and recreate it - unit

anyone want to talk about read_repair_chance

2010-09-05 Thread Edward Capriolo
Read repair chance looks to be an awesome feature. We have a pretty high cache hit rate so I would assume read repair chance would reduce a lot of bandwidth and disk activity across our cluster. Does anyone have some statistics or experiences they want to share? Edward

Re: Cache capacity set with JConsole is lost after restart

2010-09-03 Thread Edward Capriolo
On Fri, Sep 3, 2010 at 9:22 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Hi, We’re not setting cache capacity upon creation of Column Family, since the type and capacity is unknown at that time. By default it = 0. After Column Family has enough data and we could decide on

Re: Cassandra on AWS across Regions

2010-09-01 Thread Edward Capriolo
On Wed, Sep 1, 2010 at 4:42 PM, Peter Fales peter.fa...@alcatel-lucent.com wrote: I probably should have made it clear that I wasn't proposing this as an official patch (as you point out, it's not general enough for production use).   I'm just looking for feedback on the concept (thanks!) and

Re: Question regarding tombstone removal on 0.6.4

2010-08-31 Thread Edward Capriolo
On Tue, Aug 31, 2010 at 4:06 PM, Jonathan Ellis jbel...@gmail.com wrote: does http://wiki.apache.org/cassandra/DistributedDeletes and http://wiki.apache.org/cassandra/MemtableSSTable help? On Tue, Aug 31, 2010 at 3:04 PM, Dwight Smith dwight.sm...@alcatel-lucent.com wrote: Hi I am

Re: Cassandra HAProxy

2010-08-30 Thread Edward Capriolo
On Mon, Aug 30, 2010 at 12:40 PM, Dave Viner davevi...@pobox.com wrote: FWIW - we've been using HAProxy in front of a cassandra cluster in production and haven't run into any problems yet.  It sounds like our cluster is tiny in comparison to Anthony M's cluster.  But I just wanted to mentioned

Re: Read before Write

2010-08-27 Thread Edward Capriolo
On Fri, Aug 27, 2010 at 1:26 PM, Ran Tavory ran...@gmail.com wrote: I haven't benchmarked so it's purely theoretical. If there's no caching then I'm pretty sure just writing would yield better performance. If you do cache rows/keys it really depends on your hit ratio. Naturally if you have a

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-27 Thread Edward Capriolo
On Fri, Aug 27, 2010 at 6:49 PM, Jonathan Ellis jbel...@gmail.com wrote: I supsect something else is making the difference for ecapriolo.  The documentation says, The incremental mode is meant to lessen the impact of long concurrent phases by periodically stopping the concurrent phase to

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-25 Thread Edward Capriolo
On Tue, Aug 24, 2010 at 11:29 AM, Mikio Braun mi...@cs.tu-berlin.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear all, thanks again for all the comments I got on my last post. I've played a bit with different GC settings and got my Cassandra instance to run very nicely with 8GB

Re: Node OOM Problems

2010-08-22 Thread Edward Capriolo
On Sun, Aug 22, 2010 at 7:11 AM, Wayne wav...@gmail.com wrote: Currently each node has 4x1TB SATA disks. In MySQL we have 15tb currently with no replication. To move this to Cassandra replication factor 3 we need 45TB assuming the space usage is the same, but it is probably more. We had

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
On Thu, Aug 19, 2010 at 2:48 PM, Wayne wav...@gmail.com wrote: I am having some serious problems keeping a 6 node cluster up and running and stable under load. Any help would be greatly appreciated. Basically it always comes back to OOM errors that never seem to subside. After 5 minutes or 3

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
in term of GC. Thanks. On Thu, Aug 19, 2010 at 9:44 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Thu, Aug 19, 2010 at 2:48 PM, Wayne wav...@gmail.com wrote: I am having some serious problems keeping a 6 node cluster up and running and stable under load. Any help would be greatly

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
On Thu, Aug 19, 2010 at 4:49 PM, Wayne wav...@gmail.com wrote: What is my live set? Is the system CPU bound given the few statements below? This is from running 4 concurrent processes against the node...do I need to throttle back the concurrent read/writers? I do all reads/writes as Quorum.

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Edward Capriolo
On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis jbel...@gmail.com wrote: If you read the stack traces you pasted, the node in question ran out of diskspace.  When you have 25% space free this is not surprising. But fundamentally you are missing something important from your story here.  

Re: cache sizes using percentages

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 1:55 PM, Artie Copeland yeslinux@gmail.com wrote: if i set a key cache size of 100% the way i understand how that works is: - the cache is not write through, but read through - a key gets added to the cache on the first read if not already available - the size of

Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 10:55 PM, Chen Xinli chen.d...@gmail.com wrote: Hi, We are going to use cassandra for searching purpose like inbox search. The reading qps is very high, we'd like to use ConsitencyLevel.One for reading and disable read-repair at the same time. For reading consistency

Hive Storage Handler for Cassandra

2010-08-16 Thread Edward Capriolo
Hello, Anyone interested in doing map/reduce on Cassandra data should take a look at Cassandra Storage Handler for Hive. Storage handlers give Hive the ability to work with data outside HDFS in a more natural way. Support is now in place for reading and writing to/from Standard Column Families

Re: indexing rows ordered by int

2010-08-15 Thread Edward Capriolo
On Sunday, August 15, 2010, S Ahmed sahmed1...@gmail.com wrote: For CF that I need to perform range scans on, I create separate CF that have custom ordering. Say a CF holds comments on a story (like comments on a reddit or digg story post) So if I need to order comments by votes, it seems I

a plea not to remove rowsize warning

2010-08-11 Thread Edward Capriolo
Hello all, I recently posted on list about a situation where two of my nodes from my 16 node were garbage collecting and at ooming. I was able to move my xmx from 9gb to 11gb to see that rather then my memory saw tooth. I would saw tooth around 4 gb before memory shot up like a rocket. After

Growing commit log directory.

2010-08-09 Thread Edward Capriolo
I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277| ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888v | 10.71.71.66 Up 43.51 GB

Re: Growing commit log directory.

2010-08-09 Thread Edward Capriolo
On Mon, Aug 9, 2010 at 8:20 PM, Jonathan Ellis jbel...@gmail.com wrote: what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show? On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a 16 node 6.3 cluster and two nodes from my cluster

Re: unable to start cassandra

2010-08-03 Thread Edward Capriolo
On Tue, Aug 3, 2010 at 10:47 AM, Maciej Lisowski m.lisow...@powerprice.pl wrote: Hi all, I’m new here and new with Cassandra and I’ve got problem to run it (v. 0.6.4) with jdk1.6.0_21. When I type “cassandra” to run it I get error: ERROR 16:23:53,803 Uncaught exception in thread

Re: unable to start cassandra

2010-08-03 Thread Edward Capriolo
On Tue, Aug 3, 2010 at 11:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Tue, Aug 3, 2010 at 10:47 AM, Maciej Lisowski m.lisow...@powerprice.pl wrote: Hi all, I’m new here and new with Cassandra and I’ve got problem to run it (v. 0.6.4) with jdk1.6.0_21. When I type “cassandra

Re: how to recover cassandra data

2010-08-02 Thread Edward Capriolo
On Mon, Aug 2, 2010 at 9:11 AM, john xie shanfengg...@gmail.com wrote: ReplicationFactor = 3 one day i stop 192.168.1.147 and remove cassandra data by mistake, can i recover  192.168.1.147's cassadra data by restart cassandra ?    DataFileDirectories         

Re: Quick Poll: Server names

2010-07-27 Thread Edward Capriolo
On Tue, Jul 27, 2010 at 11:49 AM, uncle mantis uncleman...@gmail.com wrote: Ah S**T! The Pooh server is is down again! =) What does one do if they run out of themed names? Regards, Michael On Tue, Jul 27, 2010 at 10:46 AM, Brett Thomas brettptho...@gmail.com wrote: I like names of

<    3   4   5   6   7   8