Best Practices

2010-07-14 Thread Mubarak Seyed
Are there any best practices for Storage configurations, MemTable thresholds and Linux performance tuning to tune Cassandra nodes? -- Thanks, Mubarak Seyed.

Frequent crashes

2010-07-14 Thread 王一锋
Hi, Has anyboy done any memory usage analysis for cassandra? How much memory does cassandra need to manager 300G of data load? How much extra memory will be needed when doing compaction? Regarding mmap, memory usage will be determined by the OS so it has nothing to do with the heap size of

Re: Frequent crashes

2010-07-14 Thread Peter Schuller
How much memory does cassandra need to manager 300G of data load? How much extra memory will be needed when doing compaction? For one thing it depends on the data. One thing that scales linearly (but with a low constant) with the amount of data are the bloom filters. If those 300 GB correspond

Re: How to stop Cassandra running in embeded mode

2010-07-14 Thread Andriy Kopachevsky
Ran, I do know to run jest in own thread with maven surefire plugin, but don't sure how can I do this with own JVM for each test. How are you doing this? Thanks. On Fri, Jul 9, 2010 at 10:33 PM, Ran Tavory ran...@gmail.com wrote: The workaround I do is fork always. Each test pulls up its own

Re: How to stop Cassandra running in embeded mode

2010-07-14 Thread Ran Tavory
look at my pom. it has forkModealways/ http://github.com/rantav/hector/blob/master/pom.xml#L95 On Wed, Jul 14, 2010 at 3:02 PM, Andriy Kopachevsky kopachev...@gmail.comwrote: Ran, I do know to run jest in own thread with maven surefire plugin, but don't sure how can I do this with own JVM for

RE: Using Pelops with Cassandra 0.7.X

2010-07-14 Thread Dop Sun
Hector will released one along with 0.7, or there are any beta or alpha before official release of 0.7? I’m planning to update my client to work with Cassandra 0.7 trunk now, and I have a dependency on your library. J Dop From: Ran Tavory [mailto:ran...@gmail.com] Sent: Wednesday,

Re: Authentication

2010-07-14 Thread Jonathan Ellis
Sounds good to me. On Wed, Jul 14, 2010 at 12:25 AM, Mike Malone m...@simplegeo.com wrote: Yep, as Ben said, we're not asking for anyone to write this for us. We've been playing with some ideas around encryption between EC2 data-centers/regions (intra-region is already secure enough for us --

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

2010-07-14 Thread Jonathan Ellis
socketexception means this is coming from the network, not the sstables knowing the full error message would be nice, but just about any problem on that end should be fixed by adding connection pooling to your client. (moving to user@) On Wed, Jul 14, 2010 at 5:09 AM, Thomas Downing

Re: NYC Cassandra training

2010-07-14 Thread Jonathan Ellis
Turns out we can get a list from Eventbrite: http://www.eventbrite.com/org/474011012?s=1926097 On Tue, Jul 13, 2010 at 3:09 PM, Jonathan Ellis jbel...@gmail.com wrote: On Fri, Jul 9, 2010 at 9:36 AM, Jeremy Dunck jdu...@gmail.com wrote: On Fri, Jul 2, 2010 at 1:08 PM, Jonathan Ellis

Denver and Seattle Cassandra training

2010-07-14 Thread Jonathan Ellis
Denver on Sept 10 Seattle on Oct 8 http://www.eventbrite.com/org/474011012?s=1926097 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com

Re: NYC Cassandra training

2010-07-14 Thread S Ahmed
How will we load the VM on our machines? Do we download it ? Is it running Ubuntu? On Wed, Jul 14, 2010 at 11:11 AM, Jonathan Ellis jbel...@gmail.com wrote: Turns out we can get a list from Eventbrite: http://www.eventbrite.com/org/474011012?s=1926097 On Tue, Jul 13, 2010 at 3:09 PM,

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

2010-07-14 Thread Peter Schuller
[snip] I'm not sure that is the case. When the server gets into the unrecoverable state, the repeating exceptions are indeed SocketException: Too many open files. [snip] Although this is unquestionably a network error,  I don't think it is actually a network problem per se, as the maximum

get_range_slices return the same rows

2010-07-14 Thread shimi
I wrote a code that iterate on all the rows by using get_range_slices. for the first call I use KeyRange from to . for all the others I use from the last key that I got in the previous iteration to . I always get the same rows that I got in the previous iteration. I tried changing the batch size

Re: NYC Cassandra training

2010-07-14 Thread Jonathan Ellis
I bring a USB drive for every attendee. The VM runs Debian. On Wed, Jul 14, 2010 at 10:20 AM, S Ahmed sahmed1...@gmail.com wrote: How will we load the VM on our machines?  Do we download it ? Is it running Ubuntu? On Wed, Jul 14, 2010 at 11:11 AM, Jonathan Ellis jbel...@gmail.com wrote:

My First Cassandra

2010-07-14 Thread Geoffry Roberts
All, Can anyone help? I followed the instructions for a single node installation of Cassandra. I tried to start it and got: ERROR 08:13:53,499 Exception encountered during startup. java.io.StreamCorruptedException: invalid stream header: 61696E5D at

Re: get_range_slices return the same rows

2010-07-14 Thread Jonathan Ellis
This is a bug. If you can give us data to reproduce with we can fix it faster. On Wed, Jul 14, 2010 at 10:29 AM, shimi shim...@gmail.com wrote: I wrote a code that iterate on all the rows by using get_range_slices. for the first call I use KeyRange from to . for all the others I use from the

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

2010-07-14 Thread Jorge Barrios
Thomas, I had a similar problem a few weeks back. I changed my code to make sure that each thread only creates and uses one Hector connection. It seems that client sockets are not being released properly, but I didn't have the time to dig into it. Jorge On Wed, Jul 14, 2010 at 8:28 AM, Peter

node down window

2010-07-14 Thread B. Todd Burruss
there is a window of time from when a node goes down and when the rest of the cluster actually realizes that it is down. what happens to writes during this time frame? does hinted handoff record these writes and then handoff when the down node returns? or does hinted handoff not kick in until

Re: Too many open files [was Re: Minimizing the impact of compaction on latency and throughput]

2010-07-14 Thread Jorge Barrios
Each of my top-level functions was allocating a Hector client connection at the top, and releasing it when returning. The problem arose when a top-level function had to call another top-level function, which led to the same thread allocating two connections. Hector was not releasing one of them

Re: node down window

2010-07-14 Thread Jonathan Ellis
On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss bburr...@real.com wrote: there is a window of time from when a node goes down and when the rest of the cluster actually realizes that it is down. what happens to writes during this time frame?  does hinted handoff record these writes and then

key types and grouping related rows together

2010-07-14 Thread S Ahmed
Where is the link that describes the various key types and their impact on sorting? (I believe I read it before, can't seem to find it now). So my application supports multi-tenants, so I need the keys to represent things like: website1123 + contentID or website3454 + userID And for range

Re: node down window

2010-07-14 Thread B. Todd Burruss
thx, but disappointing :) is this just something we have to live with and periodically repair the nodes? or is there future work to tighten up the window? thx On Wed, 2010-07-14 at 12:13 -0700, Jonathan Ellis wrote: On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss bburr...@real.com wrote:

timestamps and batch_mutation

2010-07-14 Thread Aaron Morton
Is it OK or recommended to use the same timestamp value for all Column and Deletion records sent in a batch mutation? Am thinking of cases where there is a potential for multiple clients to update the same key (with multiple columns) at the same time. In the use case it's acceptable, as the client

Re: key types and grouping related rows together

2010-07-14 Thread Aaron Morton
The key structure you have should group the keys based on the website There are some differences between range queries with RP and OPP this article may help http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/AaronOn 15 Jul, 2010,at 08:44 AM, S Ahmed

Bootstrap question

2010-07-14 Thread Anthony Molinaro
Hi, I have a 0.6.3 cluster which contains 6 nodes. I added 6 new nodes by setting AutoBootstrap to true and setting an InitialToken on each new node, then waiting for the Bootstrapping message in the log before starting another. Then I've been watching the logs on the old boxes waiting to see

Re: node down window

2010-07-14 Thread Jonathan Ellis
Coordination in a distributed system is difficult. I don't think we can fix HH's existing edge cases, without introducing other more complicated edge cases. So weekly-or-so repair will remain a common maintenance task for the forseeable future. On Wed, Jul 14, 2010 at 4:17 PM, B. Todd Burruss

Re: timestamps and batch_mutation

2010-07-14 Thread Jonathan Ellis
It is good style but may not be necessary. On Wed, Jul 14, 2010 at 4:54 PM, Aaron Morton aa...@thelastpickle.com wrote: Is it OK or recommended to use the same timestamp value for all Column and Deletion records sent in a batch mutation? Am thinking of cases where there is a potential for

Re: Bootstrap question

2010-07-14 Thread Jonathan Ellis
Each node logs what token it is going to bootstrap to. Who owns the ranges that contain those tokens? On Wed, Jul 14, 2010 at 5:58 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: Hi,  I have a 0.6.3 cluster which contains 6 nodes.  I added 6 new nodes by setting AutoBootstrap to true

Bootstrap Token collision

2010-07-14 Thread Mubarak Seyed
The cluster nodes were running fine. When i restarted to modify the JVM heap settings, two of the nodes are not joining the cluster and throws Bootstrap Token collision Any idea how to fix this error? ERROR [GMFD:1] 2010-07-15 01:23:13,756 DebuggableThreadPoolExecutor.java (line 101) Error in

Re: key types and grouping related rows together

2010-07-14 Thread Schubert Zhang
for your apps, how about this schema: key: website1123 columnName: UserID ... On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton aa...@thelastpickle.comwrote: The key structure you have should group the keys based on the website There are some differences between range queries with RP and OPP this

Seed and nodetool

2010-07-14 Thread Claire Chang
I have 3 nodes A, B, C with RF=3. When I configure the cluster and before start taking any read/write request, I first start A, put A itself as seed (following in the instructions on wiki), and then start B (put A as the seed) and then start C (also put A as the seed). B and C seem joining the

Re: Seed and nodetool

2010-07-14 Thread Claire Chang
BTW, A is 192.168.11.29 B is 192.168.11.28 C is 192.168.11.27 from the result of nodetool ring, does it mean that B thinks A, C are down and C thinks B is down? I tried to restart B and for a bring moment, I didn't get this problem (all the nodes are all from nodetool) but after a while, this

Data in Cassandra

2010-07-14 Thread Hendro Kaskus
Hi everyone, I'm newbie to Cassandra :D.. I try to insert data from MySQL to Cassandra. Data dump from MySQL is about 11 MB (64716 records). But when i'm insert to Cassandra, i think the data is become bigger than in MySQL. Is it true...??? Thanks

Re: Seed and nodetool

2010-07-14 Thread Aaron Morton
Can you do an insert with CL ALL?Are there any ERRORs in the log file? Try turning the logging up the TRACE and see whats happening. Check B and see A by ssh'ing into B and using node tool from there to connect to A. Do you have any switches / firewalls between the nodes ? Could this be