What is consuming the heap?

2010-07-20 Thread 王一锋
In my cluster, I have set both KeysCached and RowsCached of my column family on all nodes to 0, but it still happened that a few nodes crashed because of OutOfMemory (from the gc.log, a full gc wasn't able to free up any memory space), what else can be consuming the heap? heap size is 10G and

SV: What is consuming the heap?

2010-07-20 Thread Thorvaldsson Justus
Supercolumn/column must fit into node memory It could be? /Justus Från: 王一锋 [mailto:wangyif...@aspire-tech.com] Skickat: den 20 juli 2010 08:48 Till: user Ämne: What is consuming the heap? In my cluster, I have set both KeysCached and RowsCached of my column family on all nodes to 0, but it

RE: A very short summary on Cassandra for a book

2010-07-20 Thread Sanjay Sharma
Hi Jonathan, I fear 'row-oriented' could fuel the holy war between 'row-based RDBMS' and 'column-oriented NoSQL databases' Some related reads here - -http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html -http://en.wikipedia.org/wiki/Column-oriented_DBMS

Re: SV: What is consuming the heap?

2010-07-20 Thread 王一锋
No, I don't think so. Because I'm not using supercolumn and size of a column will not exceed 1M 2010-07-20 发件人: Thorvaldsson Justus 发送时间: 2010-07-20 14:52:22 收件人: 'user@cassandra.apache.org' 抄送: 主题: SV: What is consuming the heap? Supercolumn/column must fit into node memory It

Re: Data from multiple tables (Join Data)

2010-07-20 Thread bujji
hi Aaron, Thanks for your reply I can integrate some transaction mechanism with Cassandra so that i can do the transactions. but is it possible to get data from more than one table without much overload and in an efficient way ? give me some good example if possible... Thanks, Visu On Tue,

Re: Define keyspaces in cassandra 0.7

2010-07-20 Thread GH
Here is a snippet out of some of my test code to try, I cut out most of the irrellevant bits, hope it works for you... (the original code worked here :-) TSocket socket = new TSocket(localhost, 9160); TTransport transport; transport = socket;

Re: Data from multiple tables (Join Data)

2010-07-20 Thread aaron morton
I'm not sure what your overload concern is. You either need to make multiple requests or de-normalise so that your query can be resolved from one CF. There are no joins. Try it and see would be the best advice. You can always add more nodes to the cluster. Aaron On 20 Jul 2010, at 19:19,

SV: UnavailableException on QUORUM write

2010-07-20 Thread Per Olesen
Hi, Think I might have found out the problem. I had only one seed node, and when that node is down, they all give UnavailableException. Guess at least one seed needs to be up then? Sounds fair. /Per Fra: Per Olesen [...@trifork.com] Sendt: 9. juli 2010

Re: How to stop Cassandra running in embeded mode

2010-07-20 Thread Bjorn Borud
Jonathan Ellis jbel...@gmail.com writes: there's some support for this in 0.7 (see http://issues.apache.org/jira/browse/CASSANDRA-1018) but fundamentally it's not really designed to be started and stopped multiple times within the same process. I am currently struggling with some of the same

Re: How to get the 'system' keyspace info?

2010-07-20 Thread Jonathan Ellis
internal error means check the cassandra log for the stacktrace On Mon, Jul 19, 2010 at 10:36 PM, ChingShen chingshenc...@gmail.com wrote: cassandra get system.LocationInfo['L'] Exception Internal error processing get_slice What's wrong? Thanks. Shen -- Jonathan Ellis Project Chair,

Re: What is consuming the heap?

2010-07-20 Thread Jonathan Ellis
you should post the full stack trace. 2010/7/20 王一锋 wangyif...@aspire-tech.com: In my cluster, I have set both KeysCached and RowsCached of my column family on all nodes to 0, but it still happened that a few nodes crashed because of OutOfMemory (from the gc.log, a full gc wasn't able to free

Re: UnavailableException on QUORUM write

2010-07-20 Thread Jonathan Ellis
Seed should only be important when joining the cluster. You're using the Thrift API, right? On Tue, Jul 20, 2010 at 5:34 AM, Per Olesen p...@trifork.com wrote: Hi, Think I might have found out the problem. I had only one seed node, and when that node is down, they all give

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-20 Thread Juho Mäkinen
I managed to run a few benchmarks. Servers r/s 164.5k 259.5k The configuration: Client: Machine with four Quad Core Intel Xeon CPU E5520 @ 2.27Ghz cpus (total 16 cores), 4530 bogomips per core. 12 GB ECC corrected memory. Supermicro mainboard (not sure about exact type).

Re: How to stop Cassandra running in embeded mode

2010-07-20 Thread Jesse McConnell
separate jvm is the only mechanism to 'shutdown' in a test scenario right nowand its unlikely to change in the short term so designing around forking is your best bet cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Jul 20, 2010 at 05:47, Bjorn Borud bbo...@gmail.com

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-20 Thread Peter Schuller
But what's then the point with adding nodes into the ring? Disk speed! Well, it may also be cheaper to service an RPC request than service a full read or write, even in terms of CPU. But: Even taking into account that requests are distributed randomly, the cluster should still scale. You will

Re: What is consuming the heap?

2010-07-20 Thread Peter Schuller
heap size is 10G and the load of data per node was around 300G, 16-core CPU, Are the 300 GB made up of *really* small values? Per SS table bloom filters do consume memory, but you'd have to have a *lot* of *really* small values for a 300 GB database to cause bloom filters to be a significant

Script 'hangs' when i stop 1 cassandra node (of 4 nodes)

2010-07-20 Thread Pieter Maes
Hi, I'm currently using Cassandra 0.6.3 with php thrift (svn r959516) in the phpcassa wrapper (last git + a fix of mine that fixes strange timeouts..). (yeah i use php, don't shoot me for it) (i also mailed that mailing list, but no answer yet from there) When i was running my migration script

does Net::Cassandra work for 0.6.3?

2010-07-20 Thread Alexander Rothenberg
Hi, we consider using cassandra to replace a lot of old logging-mechanisms to record pageviews/userdata/searchparams/hits etc from websites. (later, we want to monitor those data). Looking at the API and ways to communicate to the cassandra-server, i would like to use the perl-client

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-20 Thread Ryan King
On Tue, Jul 20, 2010 at 6:20 AM, Juho Mäkinen juho.maki...@gmail.com wrote: I managed to run a few benchmarks. Servers   r/s   1        64.5k   2        59.5k The configuration: Client: Machine with four Quad Core Intel Xeon CPU E5520 @ 2.27Ghz cpus (total 16 cores), 4530 bogomips per

Re: UnavailableException on QUORUM write

2010-07-20 Thread Jonathan Ellis
On Tue, Jul 20, 2010 at 6:40 AM, Per Olesen p...@trifork.com wrote: Seed should only be important when joining the cluster.  You're using the Thrift API, right? Yep! And when one of my non-seed nodes in my 3 node cluster is down, I do NOT get the exception. Anyway, guess I need to try and

Ran into an issue where Cassandra Crashed when running out of heap space

2010-07-20 Thread Dathan Pattishall
 INFO [HINTED-HANDOFF-POOL:1] 2010-07-20 15:10:43,721 HintedHandOffManager.java (line 210) Finished hinted handoff of 0 rows to endpoint /10.129.28.23 ERROR [pool-1-thread-37895] 2010-07-20 15:10:51,622 CassandraDaemon.java (line 83) Uncaught exception in thread Thread[pool-1-thread-37895,5,main]

Re: Ran into an issue where Cassandra Crashed when running out of heap space

2010-07-20 Thread Peter Schuller
CassandraDaemon.java (line 83) Uncaught exception in thread Thread[pool-1-thread-37895,5,main] java.lang.OutOfMemoryError: Java heap space     at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296)     at

Re: Cassandra benchmarking on Rackspace Cloud

2010-07-20 Thread Peter Schuller
(I'm hoping to have time to run my test on EC2 tonight; will see.) Well, I needed three c1.xlarge EC2 instances running py_stress to even saturate more than one core on the c1.xlarge instance running a single cassandra node (at roughly 21k reqs/second)... Depending on how reliable vmstat/top is

more questions on Cassandra ACID properties

2010-07-20 Thread Alex Yiu
Hi, I have more questions on Cassandra ACID properties. Say, I have a row that has 3 columns already: colA, colB and colC And, if two *concurrent* clients perform a different insert(...) into the same row, one insert is for colD and the other insert is for colE. Then, Cassandra would guarantee

testing please ignore

2010-07-20 Thread Alex Yiu
testing please ignore

Re: Understanding atomicity in Cassandra

2010-07-20 Thread Patricio Echagüe
Hi, regarding the retrying strategy, I understand that it might make sense assuming that the client can actually perform a retry. We are trying to build a fault tolerance solution based on Cassandra. In some scenarios, the client machine can go down during a transaction. Would it be bad design

Re: Ran into an issue where Cassandra Crashed when running out of heap space

2010-07-20 Thread Dathan Pattishall
The storage structure is rather simple. For every 1 key there is 1 column and a timestamp for that column. ColumnFamily Name=Standard2 CompareWith=UTF8Type / We don't enable pulling a huge amount of data and all other nodes are up servicing the same request. I suspect there may be another

Re: Ran into an issue where Cassandra Crashed when running out of heap space

2010-07-20 Thread Peter Schuller
Attaching Jconsole shows that there is a growth of memory and weird spikes. Unfortunately I did not take a screen shot of the growth of the spike over time. I'll do that when it occurs again. Note that expected behavior for CMS is to have lots of small ups and downs as a result of young

Re: Ran into an issue where Cassandra Crashed when running out of heap space

2010-07-20 Thread Ryan King
On Tue, Jul 20, 2010 at 1:28 PM, Peter Schuller peter.schul...@infidyne.com wrote: Attaching Jconsole shows that there is a growth of memory and weird spikes. Unfortunately I did not take a screen shot of the growth of the spike over time. I'll do that when it occurs again. Note that expected

Re: Understanding atomicity in Cassandra

2010-07-20 Thread Jonathan Ellis
2010/7/20 Patricio Echagüe patric...@gmail.com: Would it be bad design to store all the data that need to be consistent under one big key? That really depends how unnatural it is from a query perspective. :) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source

Re: more questions on Cassandra ACID properties

2010-07-20 Thread Jonathan Shook
You are correct. In this case, Cassandra would journal two writes to the same logical row, but they would be 2 independent writes. Writes do not depend on reads, so they are self-contained. If either column exists already, it will be overwritten. These journaled actions would then be applied to

Re: Bootstrap question

2010-07-20 Thread Anthony Molinaro
I see this in the old nodes DEBUG [WRITE-/10.220.198.15] 2010-07-20 21:15:50,366 OutboundTcpConnection.java (line 142) attempting to connect to /10.220.198.15 INFO [GMFD:1] 2010-07-20 21:15:50,391 Gossiper.java (line 586) Node /10.220.198.15 is now part of the cluster INFO [GMFD:1] 2010-07-20

Re: more questions on Cassandra ACID properties

2010-07-20 Thread Alex Yiu
Hi, all, (Jonathan Ellis, Jonathan Shook, Aaron Morton) Thanks for the confirmation. JonE, the update wording has been added to wiki page w.r.t. to insert and mutation API. Regards, Alex Yiu On Tue, Jul 20, 2010 at 2:02 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Jul 20, 2010 at

Re: Understanding atomicity in Cassandra

2010-07-20 Thread Alex Yiu
Hi, Patricio, It's hard to comment on your original questions without knowing details of your own domain specific data model and data processing expectation. W.R.T. lumping things into one big row, there is a limitation on data model in Cassandra. You got CF and SCF. That is, you have only 2

how come some nodes will drop nodes from the ring and not others?

2010-07-20 Thread Dathan Pattishall
dsh is a distributed shell that basically runs the same command on multiple servers. Notice that cass03 sees all 4 servers, yet the other 3 only sees three servers? Storage-conf.xml is the same among all nodes i.e. Seeds Seed10.129.28.14/Seed Seed10.129.28.20/Seed

RE: how come some nodes will drop nodes from the ring and not others?

2010-07-20 Thread Stu Hood
Did you copy the data directories from one node to the others? http://wiki.apache.org/cassandra/FAQ#cloned -Original Message- From: Dathan Pattishall datha...@gmail.com Sent: Tuesday, July 20, 2010 6:09pm To: user@cassandra.apache.org Subject: how come some nodes will drop nodes from the

Re: how come some nodes will drop nodes from the ring and not others?

2010-07-20 Thread Dathan Pattishall
No did not copy the data directories from one node to another. This data is new data, newly created from scratch. On Tue, Jul 20, 2010 at 4:17 PM, Stu Hood stu.h...@rackspace.com wrote: Did you copy the data directories from one node to the others? http://wiki.apache.org/cassandra/FAQ#cloned

Re: How to get the 'system' keyspace info?

2010-07-20 Thread ChingShen
Thanks Jonathan Ellis, I got an error message as below: ERROR [pool-1-thread-1] 2010-07-21 08:51:46,582 Cassandra.java (line 1242) Internal error processing get_slice java.lang.RuntimeException:* No replica strategy configured for system* Because the system keyspace is for Cassandra internals,

Re: what causes a cassandra to block and throw a null exception

2010-07-20 Thread Dathan Pattishall
Just sent one of the nodes back. Pool NameActive Pending Completed STREAM-STAGE 0 0 0 RESPONSE-STAGE0 0 151071 ROW-READ-STAGE0 0 100398 LB-OPERATIONS

Re: Estimated release for Cassandra 0.6.4

2010-07-20 Thread CassUser CassUser
Thanks Eric. On Tue, Jul 20, 2010 at 8:14 PM, Eric Evans eev...@rackspace.com wrote: On Tue, 2010-07-20 at 13:53 -0700, CassUser CassUser wrote: Is there a release date (or approximate date) for cassandra 0.6.4. We are mainly concerned about the Cassandra-1042 patch. The reason we don't

Re: Re: What is consuming the heap?

2010-07-20 Thread 王一锋
So the bloom filters reside in memory completely? We do have a lot of small values, hundreds of millions of columns in a columnfamily. I count the total size of *-Filter.db files in my keyspace, it's 436,747,815bytes. I guess this means it won't consume a major part of 10g heap space

get the latest column fails in cassandra 7

2010-07-20 Thread Bujji4Tech
hi all , I am trying Cassandra 7(using latest build) got problem in getting the latest column in a row. and my code is here SlicePredicate predicate = new SlicePredicate(); predicate.slice_range = new SliceRange(new byte[0], new byte[0], true,1); ColumnParent column_parent