Re: Regarding Cassandra Scalability
Hi Paul, I do not have any pressure to build software using Cassandra right now. I am studying and exploring Cassandra now. Hence I have a big curiosity about Cassandra. Ok I will continue my study and wait better documentation. Dir. On Mon, Apr 19, 2010 at 1:44 PM, Paul Prescod pres...@gmail.com wrote: On Sun, Apr 18, 2010 at 9:14 AM, dir dir sikerasa...@gmail.com wrote: Hi Gary, The main reason is that the compaction operation (removing deleted values) currently requires that an entire row be read into memory. Thank you for your explanation. But I still do not understand what do you mean. Do you have a pressing need to use Cassandra right now, before version 1.0 is even available? That limitation will go away before 1.0, so you could simply wait and not worry about it. Documentation will also be much more complete in the future. Paul Prescod
cassandra monitoring
Hi, What is the preferred way of monitoring Cassandra clusters? Is Cassandra integrated with Ganglia? Thank you very much! Best regards, Daniel.
0.6 insert performance .... Re: [RELEASE] 0.6.1
I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com
RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1
I'm seeing some issues like this as well, in fact, I think seeing your graphs has helped me understand the dynamics of my cluster better. Using some ballpark figures for inserting single column objects of ~500 bytes onto individual nodes(not when combined as a cluster): Node1: Inserts 12000/s Node2: Inserts 12000/s Node3: Inserts 9000/s Node4: Inserts 6000/s When combined as a cluster, inserts are around 7000/s (replication factor of 2) When GC kicks in anywhere in the cluster, Quorum writes slowdown for everyone associated with that node. And the fact that there are 4 Nodes, almost implies garbage collection will be going on somewhere almost all the time. So while I should be able to write more than 12,000/second, my slowest node in the cluster seems to overwhelm the faster nodes and drag everyone down. I'm still running tests of various combinations to see where things work out. From: Masood Mortazavi [mailto:masoodmortaz...@gmail.com] Sent: Monday, April 19, 2010 6:15 AM To: user@cassandra.apache.org; d...@cassandra.apache.org Subject: 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.commailto:eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.commailto:eev...@rackspace.com
Re: Regarding Cassandra Scalability
On Sun, Apr 18, 2010 at 11:14, dir dir sikerasa...@gmail.com wrote: Hi Gary, The main reason is that the compaction operation (removing deleted values) currently requires that an entire row be read into memory. Thank you for your explanation. But I still do not understand what do you mean. When you delete a column in cassandra, the data is not really deleted. Instead a flag is turned on indicating the column is no longer valid (we call it a 'tombstone'). During compaction the column family is scanned and the tombstones are truly deleted. in my opinion, Actually the row contents must fit in available memory. if row contents are not fit in available memory, our software will raise exception out of memory. since it is true( the row contents must fit in available memory), then why you said that is a problem which it (Cassandra) cannot solved?? It was not correct of me to say it is a problem that cassandra cannot solve. Memory-efficient compactions will be addressed. You say: compaction operation requires that entire row be read into memory whether this is a problem of out of memory?? When we need to perform compaction operation?? In what situation we shall perform compaction operation?? You will need to address the large rows yourself (consider breaking them up). You can identify these rows during compaction by setting RowWarningThresholdInMB in storage-conf.xml. When a big enough row comes along, it is logged so you can go back later and address the problem. Regards, Gary. Thank You. Dir.
RE: Cassandra Java Client
May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class ThriftCriteria and ThriftColumnFamily has direct use of Thrift API. Also, the site itself has test code, which is actually works on Jassandra abstraction. Dop From: Nirmala Agadgar [mailto:nirmala...@gmail.com] Sent: Friday, April 16, 2010 5:56 PM To: user@cassandra.apache.org Subject: Cassandra Java Client Hi, Can anyone tell how to implement Client that can insert data into cassandra in Java. Any Code or guidelines would be helpful. - Nirmala
Re: Cassandra Java Client
How is Jassandra different from http://github.com/rantav/hector ? On Mon, Apr 19, 2010 at 9:21 AM, Dop Sun su...@dopsun.com wrote: May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class ThriftCriteria and ThriftColumnFamily has direct use of Thrift API. Also, the site itself has test code, which is actually works on Jassandra abstraction. Dop From: Nirmala Agadgar [mailto:nirmala...@gmail.com] Sent: Friday, April 16, 2010 5:56 PM To: user@cassandra.apache.org Subject: Cassandra Java Client Hi, Can anyone tell how to implement Client that can insert data into cassandra in Java. Any Code or guidelines would be helpful. - Nirmala
RE: Cassandra Java Client
Well, there are couple of points while Jassandra is created: 1. First of all, I want to create something like that is because I come from JDBC background, and familiar with Hibernate API. The ICriteria (which is created for querying) is inspired by the Criteria API from hibernate. Actually, maybe because of this background, it cost me a lot efforts try to understand Cassandra in the beginning and Thrift API also takes time to use. 2. The Jassandra creates a layer, which removes the direct link to underlying Thrift API (including the exceptions, ConsistencyLevel enumeration etc) High light this point because I believe the client of the Jassandra will benefit for the implementation changes in future, for example, if the Cassandra provides better Thrift API to selecting the columns for a list of keys, SCFs, or deprecating some structures, exceptions, the client may not be changed. Of cause, if Jassandra failed to approve itself, this is actually not the advantage. :) 3. The Jassandra is designed to be an JDBC like API, no less, no more. It strives to use the best API to do the quering (with token, key, SCF/ CF), doing the CRUD, but no more than that. For example, it does not cover any API like object mapping. But it should cover all the API functionalities Thrift provided. These 3 points, are different from Hector (I should be honest that I have not tried to use it before, the feeling of difference are coming from the sample code Hector provided). So, the API Jassandra abstracted was something like this: IConnection connection = DriverManager.getConnection( thrift://localhost:9160, info); try { // 2. Get a KeySpace by name IKeySpace keySpace = connection.getKeySpace(Keyspace1); // 3. Get a ColumnFamily by name IColumnFamily cf = keySpace.getColumnFamily(Standard2); // 4. Insert like this long now = System.currentTimeMillis(); ByteArray nameFirst = ByteArray.ofASCII(first); ByteArray nameLast = ByteArray.ofASCII(last); ByteArray nameAge = ByteArray.ofASCII(age); ByteArray valueLast = ByteArray.ofUTF8(Smith); IColumn colFirst = new Column(nameFirst, ByteArray.ofUTF8(John), now); cf.insert(userName, colFirst); IColumn colLast = new Column(nameLast, valueLast, now); cf.insert(userName, colLast); IColumn colAge = new Column(nameAge, ByteArray.ofLong(42), now); cf.insert(userName, colAge); // 5. Select like this ICriteria criteria = cf.createCriteria(); criteria.keyList(Lists.newArrayList(userName)) .columnRange(nameAge, nameLast, 10); MapString, ListIColumn map = criteria.select(); ListIColumn list = map.get(userName); Assert.assertEquals(3, list.size()); Assert.assertEquals(valueLast, list.get(2).getValue()); // 6. Delete like this cf.delete(userName, colFirst); map = criteria.select(); Assert.assertEquals(2, map.get(userName).size()); // 7. Get count like this criteria = cf.createCriteria(); criteria.keyList(Lists.newArrayList(userName)); int count = criteria.count(); Assert.assertEquals(2, count); } finally { // 8. Don't forget to close the connection. connection.close(); } } -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Monday, April 19, 2010 10:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra Java Client How is Jassandra different from http://github.com/rantav/hector ? On Mon, Apr 19, 2010 at 9:21 AM, Dop Sun su...@dopsun.com wrote: May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class ThriftCriteria and ThriftColumnFamily has direct use of Thrift API. Also, the site itself has test code, which is actually works on Jassandra abstraction. Dop From: Nirmala Agadgar [mailto:nirmala...@gmail.com] Sent: Friday, April 16, 2010 5:56 PM To: user@cassandra.apache.org Subject: Cassandra Java Client Hi, Can anyone tell how to implement Client that can insert data into cassandra in Java. Any Code or guidelines would be helpful. - Nirmala
tcp CLOSE_WAIT bug
Hi all, We have observed several connections between nodes in CLOSE_WAIT after several hours of operation: At node 87: netstat -tn | grep 7000 tcp0 0 :::192.168.2.87:7000:::192.168.2.88:57625 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:51541 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:58447 ESTABLISHED tcp0 0 :::192.168.2.87:7000:::192.168.2.88:51313 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:52065 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:58218 CLOSE_WAIT tcp0 0 :::192.168.2.87:54986 :::192.168.2.88:7000 ESTABLISHED tcp0 0 :::192.168.2.87:7000:::192.168.2.88:48272 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:55433 CLOSE_WAIT tcp0 0 :::192.168.2.87:59138 :::192.168.2.88:7000 ESTABLISHED tcp0 0 :::192.168.2.87:7000:::192.168.2.88:39074 ESTABLISHED tcp0 0 :::192.168.2.87:7000:::192.168.2.88:59088 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:34012 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:55806 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:42472 CLOSE_WAIT tcp0 0 :::192.168.2.87:7000:::192.168.2.88:45033 CLOSE_WAIT At the other node: 88 netstat -tn | grep 7000 tcp0 0 :::192.168.2.88:7000:::192.168.2.87:59138 ESTABLISHED tcp0 0 :::192.168.2.88:7000:::192.168.2.87:46143 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:38202 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:55852 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:39208 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:55378 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:51061 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:44911 CLOSE_WAIT tcp0 0 :::192.168.2.88:58447 :::192.168.2.87:7000 ESTABLISHED tcp0 0 :::192.168.2.88:7000:::192.168.2.87:59614 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:35033 CLOSE_WAIT tcp0 0 :::192.168.2.88:39074 :::192.168.2.87:7000 ESTABLISHED tcp0 0 :::192.168.2.88:7000:::192.168.2.87:54986 ESTABLISHED tcp0 0 :::192.168.2.88:7000:::192.168.2.87:54772 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:39925 CLOSE_WAIT tcp0 0 :::192.168.2.88:7000:::192.168.2.87:38124 CLOSE_WAIT the setup only uses two nodes, replication factor = 2 with latest jdk 6u20 and cassandra 0.6.0 Afaik CLOSE_WAIT indicates there are opened sockets do not close properly. Is anyone experience similar problem ? How do I do to find the root cause ? Any help is appreciated.
Re: tcp CLOSE_WAIT bug
Thank your information. We do use connection pools with thrift client and ThriftAdress is on port 9160. Those problematic connections we found are all in port 7000, which is internal communications port between nodes. I guess this related to StreamingService. On Mon, Apr 19, 2010 at 23:46, Brandon Williams dri...@gmail.com wrote: On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen ingramc...@gmail.comwrote: Hi all, We have observed several connections between nodes in CLOSE_WAIT after several hours of operation: This is symptomatic of not pooling your client connections correctly. Be sure you're using one connection per thread, not one connection per operation. -Brandon -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen
RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1
We see this behavior as well with 0.6, heap usage graphs look almost identical. The GC is a noticeable bottleneck, we've tried jdku19 and jrockit vm's. It basically kills any kind of soft real time behavior. From: Masood Mortazavi [mailto:masoodmortaz...@gmail.com] Sent: Monday, April 19, 2010 4:15 AM To: user@cassandra.apache.org; d...@cassandra.apache.org Subject: 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.commailto:eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.commailto:eev...@rackspace.com
Map/Reduce Cassandra Output
Different from the wordcount my input source is a directory, and I have the a split class and record reader defined. Different from wordcount during reduce I need to insert into Cassandra. I notice for the wordcount input it retrieves a handle on a cassandra client like this: TSocket socket = new TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(), DatabaseDescriptor.getThriftPort()); TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket, false, false); Cassandra.Client client = new Cassandra.Client(binaryProtocol); Would all hadoop nodes go to the same seed if i use this code to insert data, without balancing it? Has this been done somewhere in the Cassandra code already?
Re: [RELEASE] 0.6.0
On Wed, 14 Apr 2010 13:09:13 -0500 Ted Zlatanov t...@lifelogs.com wrote: TZ On Wed, 14 Apr 2010 12:23:19 -0500 Eric Evans eev...@rackspace.com wrote: EE On Wed, 2010-04-14 at 10:16 -0500, Ted Zlatanov wrote: Can it support a non-root user through /etc/default/cassandra? I've been patching the init script myself but was hoping this would be standard. EE It's the first item on debian/TODO, but, you know, patches welcome and EE all that. TZ The appended patch has been sufficient for me. Eric, do you need me to open a ticket for this, too, or is what I posted sufficient? Thanks Ted
Modelling assets and user permissions
Suppose I have a CF that holds some sort of assets that some users of my program have access to, and that some do not. In SQL-ish terms it would look something like this: TABLE Assets ( asset_id serial primary key, ... ); TABLE Users ( user_id serial primary key, user_name text ); TABLE Permissions ( asset_id integer references(Assets), user_id integer references(Users) ) Now, I can generate UUIDs for my asset keys without any trouble, so the serial that I have in my pseudo-SQL Assets table isn't a problem. My problem is that I can't see a good way to model the relationship between user ids and assets. I see one way to do this, which has problems, and I think I sort of see a second way. The obvious way to do it is have the Assets CF have a SuperColumn that somehow enumerates the users allowed to see it, so when retrieving a specific Asset I can retrieve the users list and ensure that the user doing the request is allowed to see it. This has quite a few problems. The foremost is that Cassandra doesn't appear to have much for conflict resolution (at least I can't find any docs on it), so if two processes try to add permissions to the same Asset, it looks like one process will win and I have no idea what happens to the loser. Another problem is that Cassandra's SuperColumns don't appear to be ideal for storing lists of things; they store maps, which isn't a terrible problem, but it feels like a bit of a mismatch in my design. A SuperColumn mapping from user_ids to an empty byte array seems like it should work pretty efficiently for checking whether a user has permissions on an Asset, but it also seems pretty evil. The other idea that I have is a seperate CF for AssetPermissions that somehow stores pairs of asset_ids and user_names. I don't know what I'd use for a key in that situation, so I haven't really gotten too far in seeing what else is broken with that idea. I think it would get around the race condition, but I don't know how to do it, and I'm not sure how efficient it could be. What do people normally use in this situation? I assume it's a pretty common problem, but I haven't see it in the various data modelling examples on the Wiki.
Re: Cassandra Java Client
Hi Dop, you may want to look at hector as a low level cassandra client on which you build jassandra, adding hibernate style magic etc like other ppl have done with ORM layers on top of it. Hector's main features include extensive jmx counters, failover and connection pooling. It's available for all recent versions, including 0.5.0, 0.5.1, 0.6.0 and 0.6.1 On Mon, Apr 19, 2010 at 5:58 PM, Dop Sun su...@dopsun.com wrote: Well, there are couple of points while Jassandra is created: 1. First of all, I want to create something like that is because I come from JDBC background, and familiar with Hibernate API. The ICriteria (which is created for querying) is inspired by the Criteria API from hibernate. Actually, maybe because of this background, it cost me a lot efforts try to understand Cassandra in the beginning and Thrift API also takes time to use. 2. The Jassandra creates a layer, which removes the direct link to underlying Thrift API (including the exceptions, ConsistencyLevel enumeration etc) High light this point because I believe the client of the Jassandra will benefit for the implementation changes in future, for example, if the Cassandra provides better Thrift API to selecting the columns for a list of keys, SCFs, or deprecating some structures, exceptions, the client may not be changed. Of cause, if Jassandra failed to approve itself, this is actually not the advantage. :) 3. The Jassandra is designed to be an JDBC like API, no less, no more. It strives to use the best API to do the quering (with token, key, SCF/ CF), doing the CRUD, but no more than that. For example, it does not cover any API like object mapping. But it should cover all the API functionalities Thrift provided. These 3 points, are different from Hector (I should be honest that I have not tried to use it before, the feeling of difference are coming from the sample code Hector provided). So, the API Jassandra abstracted was something like this: IConnection connection = DriverManager.getConnection( thrift://localhost:9160, info); try { // 2. Get a KeySpace by name IKeySpace keySpace = connection.getKeySpace(Keyspace1); // 3. Get a ColumnFamily by name IColumnFamily cf = keySpace.getColumnFamily(Standard2); // 4. Insert like this long now = System.currentTimeMillis(); ByteArray nameFirst = ByteArray.ofASCII(first); ByteArray nameLast = ByteArray.ofASCII(last); ByteArray nameAge = ByteArray.ofASCII(age); ByteArray valueLast = ByteArray.ofUTF8(Smith); IColumn colFirst = new Column(nameFirst, ByteArray.ofUTF8(John), now); cf.insert(userName, colFirst); IColumn colLast = new Column(nameLast, valueLast, now); cf.insert(userName, colLast); IColumn colAge = new Column(nameAge, ByteArray.ofLong(42), now); cf.insert(userName, colAge); // 5. Select like this ICriteria criteria = cf.createCriteria(); criteria.keyList(Lists.newArrayList(userName)) .columnRange(nameAge, nameLast, 10); MapString, ListIColumn map = criteria.select(); ListIColumn list = map.get(userName); Assert.assertEquals(3, list.size()); Assert.assertEquals(valueLast, list.get(2).getValue()); // 6. Delete like this cf.delete(userName, colFirst); map = criteria.select(); Assert.assertEquals(2, map.get(userName).size()); // 7. Get count like this criteria = cf.createCriteria(); criteria.keyList(Lists.newArrayList(userName)); int count = criteria.count(); Assert.assertEquals(2, count); } finally { // 8. Don't forget to close the connection. connection.close(); } } -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Monday, April 19, 2010 10:35 PM To: user@cassandra.apache.org Subject: Re: Cassandra Java Client How is Jassandra different from http://github.com/rantav/hector ? On Mon, Apr 19, 2010 at 9:21 AM, Dop Sun su...@dopsun.com wrote: May I take this chance to share this link here: http://code.google.com/p/jassandra/ It currently based with Cassandra 0.6 Thrift APIs. The class ThriftCriteria and ThriftColumnFamily has direct use of Thrift API. Also, the site itself has test code, which is actually works on Jassandra abstraction. Dop From: Nirmala Agadgar [mailto:nirmala...@gmail.com] Sent: Friday, April 16, 2010 5:56 PM To: user@cassandra.apache.org Subject: Cassandra Java Client Hi, Can anyone tell how to implement Client that can insert data into cassandra in Java. Any Code or guidelines would be helpful. - Nirmala
Re: [RELEASE] 0.6.0
On Mon, 2010-04-19 at 12:02 -0500, Ted Zlatanov wrote: EE It's the first item on debian/TODO, but, you know, patches welcome and EE all that. TZ The appended patch has been sufficient for me. Eric, do you need me to open a ticket for this, too, or is what I posted sufficient? Feel free to open a ticket, that never hurts. I had planned to use the maintainer scripts to create a system user (in an idempotent way), with a default configuration that used this new user. I had also planned to ensure that permissions were updated accordingly when upgrading from a previous version. -- Eric Evans eev...@rackspace.com
PropertyFileEndPointSnitch
When building the PropertyFileEndPointSnitch into the jar cassandra-propsnitch.jar the files in the jar end up on src/java/org/apache/cassandra/locator/PropertyFileEndPointSnitch.class instead of org/apache/cassandra/locator/PropertyFileEndPointSnitch.class. Am I doing something wrong , is this intended behavior or is it a bug? -- Regards Erik
RE: Map/Reduce Cassandra Output
If you used that snippet of code, all connections would go through the same seed: the input code does additional work to determine which nodes are holding particular key ranges, and then connects directly. For outputting from Hadoop to Cassandra, you may want to consider using a Java client like Hector, which will handle the load balancing for you. http://github.com/rantav/hector Thanks, Stu -Original Message- From: Sonny Heer sonnyh...@gmail.com Sent: Monday, April 19, 2010 11:29am To: cassandra-u...@incubator.apache.org Subject: Map/Reduce Cassandra Output Different from the wordcount my input source is a directory, and I have the a split class and record reader defined. Different from wordcount during reduce I need to insert into Cassandra. I notice for the wordcount input it retrieves a handle on a cassandra client like this: TSocket socket = new TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(), DatabaseDescriptor.getThriftPort()); TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket, false, false); Cassandra.Client client = new Cassandra.Client(binaryProtocol); Would all hadoop nodes go to the same seed if i use this code to insert data, without balancing it? Has this been done somewhere in the Cassandra code already?
restore with snapshot
I am working on finalizing our backup and restore procedures for a cassandra cluster running on EC2. I understand based on the wiki that in order to replace a single node, I don't actually need to put data on that node. I just need to bootstrap the new node into the cluster and it will get data from the other nodes. However, would is speed up the process if that node already has the data from the node it is replacing? Also, what do I do if the entire cluster goes down? I am planning to snapshot the data each night for each node. Should I save the system keyspace snapshots? Is it problematic to bring the cluster back up with new ips on each node, but the same tokens as before? Lee Parker
Re: Data model question - column names sort
On Thu, Apr 15, 2010 at 6:01 PM, Sonny Heer sonnyh...@gmail.com wrote: Need a way to have two different types of indexes. Key: aTextKey ColumnName: aTextColumnName:55 Value: Key: aTextKey ColumnName: 55:aTextColumnName Value: All the valuable information is stored in the column name itself. Above two can be in different column families... Queries: Given a key, page me a list of numerical values sorted on aTextColumnName Given a key, page me a list of text values sorted on a numerical value This approach would require left padding the numeric value for the second index so cassandra can sort on column names correctly. Don't do that, pack the numeric value into a fixed-length byte array instead. Then you don't have to do any expensive string operations in the comparator. -Jonathan
RE: Cassandra Java Client
Hi Ran: Yep, looks like there is possibility that I can add dependencies to hector, and enhance the functionality to Jassandra. I would take this chance to extend the discussion about “xxx Client for Cassandra” a little bit: In short, Cassandra may need a kind of sub-project to define the “xxx-client for Cassandra” for most of the popular platforms (like Python, Java, .NET), or, it defines a framework (standard, guideline, or whatever), and let the community to port/ implement in different language/ platform. I believe Cassandra product itself needs the flexibility to change Thrift API at any given time, including deprecating the old API, which may have bad performance, or adding new API to cover new functionality, but the production deployed applications build on Cassandra in general (in general, means not the company like FaceBook, Digg, who has huge team to follow the changes of Cassandra) cannot bear this. And if these products depend on the Thrift API, which means eventually, these deployment will be left behind with old version. This problem happens in RDBMS world, and eventually, it’s resolved with xDBC APIs. The new database coming out, it provides the general features + special features. So, the applications built on database technologies, in most of the cases, can move smoothly to the latest database server versions. With another layer of abstraction, the performance of the application does not necessarily to be slower, since the newly introduced API developer, can try to use the latest version of the Thrift API to provide the best performance for the requests of the application. An immediate example would be: in the Thrift API, the querying API (get_xxx, mult_get, and get_key_slice) has been enhanced a lot. For the application, before “xxx-client for Cassandra” updated, it can work with new version of Cassandra with using the old API. And once the “xxx-client” updated, it will immediately using the new features even without change the application codes. The reason I believe this “xxx-client for Cassandra” best to be a sub-project, because since the API of Cassandra is changing, at this stage, the design of such an API need lot of inside details/ guide. It’s very difficult for people from outside, like me, to define some API based on guessing, which may eventually can be flexible enough to support feature Cassandra. I can see several “xxx client for Cassandra” project eventually abandoned, and my guess is because of this. Maybe one day, Jassandra also cannot be further extended to meet the new API of future version of Cassandra, and I may abandon it as well. J Cheers~~~ Dop From: Ran Tavory [mailto:ran...@gmail.com] Sent: Tuesday, April 20, 2010 1:36 AM To: user@cassandra.apache.org Subject: Re: Cassandra Java Client Hi Dop, you may want to look at hector as a low level cassandra client on which you build jassandra, adding hibernate style magic etc like other ppl have done with ORM layers on top of it. Hector's main features include extensive jmx counters, failover and connection pooling. It's available for all recent versions, including 0.5.0, 0.5.1, 0.6.0 and 0.6.1 On Mon, Apr 19, 2010 at 5:58 PM, Dop Sun su...@dopsun.com wrote: Well, there are couple of points while Jassandra is created: 1. First of all, I want to create something like that is because I come from JDBC background, and familiar with Hibernate API. The ICriteria (which is created for querying) is inspired by the Criteria API from hibernate. Actually, maybe because of this background, it cost me a lot efforts try to understand Cassandra in the beginning and Thrift API also takes time to use. 2. The Jassandra creates a layer, which removes the direct link to underlying Thrift API (including the exceptions, ConsistencyLevel enumeration etc) High light this point because I believe the client of the Jassandra will benefit for the implementation changes in future, for example, if the Cassandra provides better Thrift API to selecting the columns for a list of keys, SCFs, or deprecating some structures, exceptions, the client may not be changed. Of cause, if Jassandra failed to approve itself, this is actually not the advantage. :) 3. The Jassandra is designed to be an JDBC like API, no less, no more. It strives to use the best API to do the quering (with token, key, SCF/ CF), doing the CRUD, but no more than that. For example, it does not cover any API like object mapping. But it should cover all the API functionalities Thrift provided. These 3 points, are different from Hector (I should be honest that I have not tried to use it before, the feeling of difference are coming from the sample code Hector provided). So, the API Jassandra abstracted was something like this: IConnection connection = DriverManager.getConnection( thrift://localhost:9160, info); try { // 2. Get a KeySpace by name IKeySpace keySpace =
Re: Clarification on Ring operations in Cassandra 0.5.1
On Thu, Apr 15, 2010 at 6:10 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: 1) shutdown cassandra on instance I want to replace 2) create a new instance, start cassandra with AutoBootstrap = true 3) run nodeprobe removetoken against the token of the instance I am replacing Then according to the 'Handling failure' the new instance will find the appropriate position automatically. However, it's not clear to me if this means it will take the same range as the shutdown node or not, because normally AutoBootstrap == true means it will take half the keys from the node with the most disk space used. (from the 'Bootstrap' section). So will the process I describe above result in what I want, a new node replacing an old one? As you noted, it does not exactly replace the old one. If you require the token to be the same as the dead one, then you should manually move the new node, after removing the dead one. how does removetoken know which instance to remove, does it remove the Down instance? Tokens are unique per node. (Those are the values you see in nodetool ring.) Another hopefully minor question, if I bring up a new node with AutoBootstrap = false, what happens? Does it join the ring but without data Yes. and without token range? No. (This is why you should not do that.) Can I then 'nodeprobe move token for range I want to take over', and achieve the same as step 2 above? You can't have two nodes with the same token in the ring at once. So, you can removetoken the old node first, then bootstrap the new one (just specify InitialToken in the config to avoid having it guess one), or you can make it a 3 step process (bootstrap, remove, move) to avoid transferring so much data around. -Jonathan
Re: effective modeling for fixed limit columns
Limiting by number of columns in a row will perform very poorly. Limiting by the time a column has existed can perform quite well, and was added by Sylvain for 0.7 in https://issues.apache.org/jira/browse/CASSANDRA-699 On Fri, Apr 16, 2010 at 1:50 PM, Chris Shorrock ch...@shorrockin.com wrote: I'm attempting to come up with a technique for limiting the number of columns a single key (or super column - doesn't matter too much for the context of this conversation) may contain at any one time. My actual use-case is a little too meaty to try to describe so an alternate use-case of this mechanism could be: Construct a twitter-esque feed which maintains a list N tweets. Tweets (in this system - and in reality I suppose) occur at such a rate that you want to limit a given users feed to N items. You do not have the ability to store an infinite number of tweets due to the physical constraints of your hardware. The my first idea answer is when a tweet is inserted into the the feed of a given person, that you then do a count and delete of any outstanding tweets. In reality you could first count, then (if count = N) do a batch mutate for the insertion of the new entry and the removal of the old. My issue with this approach is that after a certain point every new entry into the system will incur the removal of an old entry. The count, once a feed has reached N will always be = N on any subsequent queries. Depending on how you index the tweets you may need to actually do a read instead of count to get the row identifiers. My second approach was to utilize a slot system where you have a record stored somewhere that indicates the next slot for insertion. This can be thought of as a fixed length array where you store the next insertion point in some other column family. When a new tweet occurs you retrieve the current slot meta-data, insert into that index, then update the meta-data for the next insertion. My concerns with this relate around synchronization and losing entries due to concurrent operations. I'd rather not have to something like ZooKeeper to synchronize in the application cluster. I have some other ideas but I'm mostly just spit-balling at this point. So I thought I'd reach out the collective intelligence of the group to see if anyone has implemented something similar. Thanks in advance.
Re: why read operation use so much of memory?
(Moving to users@ list.) Like any Java server, Cassandra will use as much memory in its heap as you allow it to. You can request a GC from jconsole to see what its approximate real working set it. http://wiki.apache.org/cassandra/SSTableMemtable explains why reads are slower than writes. You can tune this by using the key cache, row cache, or by using range queries instead of requesting rows one at a time. contrib/py_stress is a better starting place for a benchmark than rolling your own, btw. we see about 8000 reads/s with that on a 4-core server. On Sun, Apr 18, 2010 at 8:40 PM, Bingbing Liu rucb...@gmail.com wrote: Hi,all I have a cluster of 5 nodes, each node has a 4 cores cpu and 8 G Memory. I use the 0.6-beta3 cassandra for testting. First , i insert 6,000,000 rows each of which is 1k bytes, the speed of write is so excited. But then ,when i read them each row at a time from two clients at the same time ,one of the client is very slow and use so long a time, i find that on each node the process of Cassandra occupy 7 G memory or so (use the top command), that puzzled me. Why read operation use so much of memory? May be i missed something? Thx. 2010-04-18 Bingbing Liu
Re: cassandra monitoring
Anything that can consume JMX. On Mon, Apr 19, 2010 at 5:34 AM, Simeonov, Daniel daniel.simeo...@sap.com wrote: Hi, What is the preferred way of monitoring Cassandra clusters? Is Cassandra integrated with Ganglia? Thank you very much! Best regards, Daniel.
Re: tcp CLOSE_WAIT bug
Is this after doing a bootstrap or other streaming operation? Or did a node go down? The internal sockets are supposed to remain open, otherwise. On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen ingramc...@gmail.com wrote: Thank your information. We do use connection pools with thrift client and ThriftAdress is on port 9160. Those problematic connections we found are all in port 7000, which is internal communications port between nodes. I guess this related to StreamingService. On Mon, Apr 19, 2010 at 23:46, Brandon Williams dri...@gmail.com wrote: On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen ingramc...@gmail.com wrote: Hi all, We have observed several connections between nodes in CLOSE_WAIT after several hours of operation: This is symptomatic of not pooling your client connections correctly. Be sure you're using one connection per thread, not one connection per operation. -Brandon -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen
Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1
It's hard to tell from those slides, but it looks like the slowdown doesn't hit until after several GCs. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 6:14 AM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com
Re: Map/Reduce Cassandra Output
Thanks Stu. I will take a look at Hector. Do you know where the input code does the additional work? On Mon, Apr 19, 2010 at 11:20 AM, Stu Hood stu.h...@rackspace.com wrote: If you used that snippet of code, all connections would go through the same seed: the input code does additional work to determine which nodes are holding particular key ranges, and then connects directly. For outputting from Hadoop to Cassandra, you may want to consider using a Java client like Hector, which will handle the load balancing for you. http://github.com/rantav/hector Thanks, Stu -Original Message- From: Sonny Heer sonnyh...@gmail.com Sent: Monday, April 19, 2010 11:29am To: cassandra-u...@incubator.apache.org Subject: Map/Reduce Cassandra Output Different from the wordcount my input source is a directory, and I have the a split class and record reader defined. Different from wordcount during reduce I need to insert into Cassandra. I notice for the wordcount input it retrieves a handle on a cassandra client like this: TSocket socket = new TSocket(DatabaseDescriptor.getSeeds().iterator().next().getHostAddress(), DatabaseDescriptor.getThriftPort()); TBinaryProtocol binaryProtocol = new TBinaryProtocol(socket, false, false); Cassandra.Client client = new Cassandra.Client(binaryProtocol); Would all hadoop nodes go to the same seed if i use this code to insert data, without balancing it? Has this been done somewhere in the Cassandra code already?
Re: busy thread on IncomingStreamReader ?
On 4/17/10 6:47 PM, Ingram Chen wrote: after upgrading jdk from 1.6.0_16 to 1.6.0_20, the problem solved. FYI, this sounds like it might be : https://issues.apache.org/jira/browse/CASSANDRA-896 http://bugs.sun.com/view_bug.do;jsessionid=60c39aa55d3666c0c84dd70eb826?bug_id=6805775 Where garbage collection issues in JVM/JDKs before 7.b70 leads to GC storming which hoses performance. =Rob
get_range_slices in hector
Is there a version of hector that has an interface to get_range_slices ? or should I provide a patch? Cheers, Chris Dean
Re: Help with MapReduce
most likely means that the count() operation is taking too long for the configured RPCTimeout counts get unreliable after a certain number of columns under a key in my experience jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk jo...@openplaces.org wrote: I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase. Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood stu.h...@rackspace.com wrote: In 0.6.0 and trunk, it is located at src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java You might be using a pre-release version of 0.6 if you are seeing a fat client based InputFormat. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 4:53pm To: user@cassandra.apache.org Subject: Re: Help with MapReduce Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually have a preference about client, I just want to be consistent with ColumnInputFormat. On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood stu.h...@rackspace.com wrote: ColumnFamilyInputFormat no longer uses the fat client API, and instead uses Thrift. There are still some significant problems with the fat client, so it shouldn't be used without a good understanding of those problems. If you still want to use it, check out contrib/bmt_example, but I'd recommend that you use thrift for now. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 2:59pm To: user@cassandra.apache.org Subject: Help with MapReduce I'm a Cassandra noob trying to validate Cassandra as a viable alternative to HBase (which we've been using for over a year) for our application. So far, I've had no success getting Cassandra working with MapReduce. My first step is inserting data into Cassandra. I've created a MapRed job based using the fat client API. I'm using the fat client (StorageProxy) because that's what ColumnFamilyInputFormat uses and I want to use the same API for both read and write jobs. When I call StorageProxy.mutate(), nothing happens. The job completes as if it had done something, but in fact nothing has changed in the cluster. When I call StorageProxy.mutateBlocking(), I get an IOException complaining that there is no connection to the cluster. I've concluded with the debugger that StorageService is not connecting to the cluster, even though I've specified the correct seed and ListenAddress (I've using the exact same storage-conf.xml as the nodes in the cluster). I'm sure I'm missing something obvious in the configuration or my setup, but since I'm new to Cassandra, I can't see what it is. Any help appreciated, Joost
Re: Help with MapReduce
err not count in your case, but same symptom, cassandra can't return the answer to your query in the configured rpctimeout time cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:40, Jesse McConnell jesse.mcconn...@gmail.com wrote: most likely means that the count() operation is taking too long for the configured RPCTimeout counts get unreliable after a certain number of columns under a key in my experience jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Mon, Apr 19, 2010 at 19:12, Joost Ouwerkerk jo...@openplaces.org wrote: I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase. Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood stu.h...@rackspace.com wrote: In 0.6.0 and trunk, it is located at src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java You might be using a pre-release version of 0.6 if you are seeing a fat client based InputFormat. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 4:53pm To: user@cassandra.apache.org Subject: Re: Help with MapReduce Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually have a preference about client, I just want to be consistent with ColumnInputFormat. On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood stu.h...@rackspace.com wrote: ColumnFamilyInputFormat no longer uses the fat client API, and instead uses Thrift. There are still some significant problems with the fat client, so it shouldn't be used without a good understanding of those problems. If you still want to use it, check out contrib/bmt_example, but I'd recommend that you use thrift for now. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 2:59pm To: user@cassandra.apache.org Subject: Help with MapReduce I'm a Cassandra noob trying to validate Cassandra as a viable alternative to HBase (which we've been using for over a year) for our application. So far, I've had no success getting Cassandra working with MapReduce. My first step is inserting data into Cassandra. I've created a MapRed job based using the fat client API. I'm using the fat client (StorageProxy) because that's what ColumnFamilyInputFormat uses and I want to use the same API for both read and write jobs. When I call StorageProxy.mutate(), nothing happens. The job completes as if it had done something, but in fact nothing has changed in the cluster. When I call StorageProxy.mutateBlocking(), I get an IOException complaining that there is no connection to the cluster. I've concluded with the debugger that StorageService is not connecting to the cluster, even though I've specified the correct seed and ListenAddress (I've using the exact same storage-conf.xml as the nodes in the cluster). I'm sure I'm missing something obvious in the configuration or my setup, but since I'm new to
Re: Help with MapReduce
hmm, might be too much data. In the case of a supercolumn, how do I specify which sub-columns to retrieve? Or can I only retrieve entire supercolumns? On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis jbel...@gmail.com wrote: Possibly you are asking it to retrieve too many columns per row. Possibly there is something else causing poor performance, like swapping. On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk jo...@openplaces.org wrote: I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase. Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood stu.h...@rackspace.com wrote: In 0.6.0 and trunk, it is located at src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java You might be using a pre-release version of 0.6 if you are seeing a fat client based InputFormat. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 4:53pm To: user@cassandra.apache.org Subject: Re: Help with MapReduce Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually have a preference about client, I just want to be consistent with ColumnInputFormat. On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood stu.h...@rackspace.com wrote: ColumnFamilyInputFormat no longer uses the fat client API, and instead uses Thrift. There are still some significant problems with the fat client, so it shouldn't be used without a good understanding of those problems. If you still want to use it, check out contrib/bmt_example, but I'd recommend that you use thrift for now. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 2:59pm To: user@cassandra.apache.org Subject: Help with MapReduce I'm a Cassandra noob trying to validate Cassandra as a viable alternative to HBase (which we've been using for over a year) for our application. So far, I've had no success getting Cassandra working with MapReduce. My first step is inserting data into Cassandra. I've created a MapRed job based using the fat client API. I'm using the fat client (StorageProxy) because that's what ColumnFamilyInputFormat uses and I want to use the same API for both read and write jobs. When I call StorageProxy.mutate(), nothing happens. The job completes as if it had done something, but in fact nothing has changed in the cluster. When I call StorageProxy.mutateBlocking(), I get an IOException complaining that there is no connection to the cluster. I've concluded with the debugger that StorageService is not connecting to the cluster, even though I've specified the correct seed and ListenAddress (I've using the exact same storage-conf.xml as the nodes in the cluster). I'm sure I'm missing something obvious in the configuration or my setup, but since I'm new to Cassandra, I can't
Re: Help with MapReduce
the latter, if you are retrieving multiple supercolumns. On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk jo...@openplaces.org wrote: hmm, might be too much data. In the case of a supercolumn, how do I specify which sub-columns to retrieve? Or can I only retrieve entire supercolumns? On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis jbel...@gmail.com wrote: Possibly you are asking it to retrieve too many columns per row. Possibly there is something else causing poor performance, like swapping. On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk jo...@openplaces.org wrote: I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase. Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood stu.h...@rackspace.com wrote: In 0.6.0 and trunk, it is located at src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java You might be using a pre-release version of 0.6 if you are seeing a fat client based InputFormat. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 4:53pm To: user@cassandra.apache.org Subject: Re: Help with MapReduce Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually have a preference about client, I just want to be consistent with ColumnInputFormat. On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood stu.h...@rackspace.com wrote: ColumnFamilyInputFormat no longer uses the fat client API, and instead uses Thrift. There are still some significant problems with the fat client, so it shouldn't be used without a good understanding of those problems. If you still want to use it, check out contrib/bmt_example, but I'd recommend that you use thrift for now. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 2:59pm To: user@cassandra.apache.org Subject: Help with MapReduce I'm a Cassandra noob trying to validate Cassandra as a viable alternative to HBase (which we've been using for over a year) for our application. So far, I've had no success getting Cassandra working with MapReduce. My first step is inserting data into Cassandra. I've created a MapRed job based using the fat client API. I'm using the fat client (StorageProxy) because that's what ColumnFamilyInputFormat uses and I want to use the same API for both read and write jobs. When I call StorageProxy.mutate(), nothing happens. The job completes as if it had done something, but in fact nothing has changed in the cluster. When I call StorageProxy.mutateBlocking(), I get an IOException complaining that there is no connection to the cluster. I've concluded with the debugger that StorageService is not connecting to the cluster, even though I've specified the correct seed and ListenAddress (I've using the
Re: 0.6.1 insert 1B rows, crashed when using py_stress
Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) ERROR [ROW-MUTATION-STAGE:9] 2010-04-19 20:43:27,932 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:9,5,main] java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:893) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:192) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:118) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:108) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:359) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:369) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:322) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:45) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) and another INFO [GC inspection] 2010-04-19 21:13:09,034 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2016 ms, 1239096 reclaimed leaving 1094238944 used; max is 1211826176 ERROR [Thread-14] 2010-04-19 21:23:18,508 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-14,5,main] java.lang.OutOfMemoryError: Java heap space at sun.nio.ch.Util.releaseTemporaryDirectBuffer(Util.java:67) at sun.nio.ch.IOUtil.read(IOUtil.java:212) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:176) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) at java.io.InputStream.read(InputStream.java:85) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:64) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:70) ERROR [COMPACTION-POOL:1] 2010-04-19 21:23:18,514 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:582) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.OutOfMemoryError: Java heap space INFO [FLUSH-WRITER-POOL:1] 2010-04-19 21:23:25,600 Memtable.java (line 162) Completed flushing /m/cassandra/data/Keyspace1/Standard1-623-Data.db ERROR [Thread-13] 2010-04-19 21:23:18,514 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-13,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-15] 2010-04-19 21:23:18,514 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-15,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [CACHETABLE-TIMER-1] 2010-04-19 21:23:18,514 CassandraDaemon.java (line 78) Fatal exception in thread
Re: 0.6.1 insert 1B rows, crashed when using py_stress
Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) ERROR [ROW-MUTATION-STAGE:9] 2010-04-19 20:43:27,932 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:9,5,main] java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:893) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:192) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:118) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:108) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:359) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:369) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:322) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:45) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) and another INFO [GC inspection] 2010-04-19 21:13:09,034 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2016 ms, 1239096 reclaimed leaving 1094238944 used; max is 1211826176 ERROR [Thread-14] 2010-04-19 21:23:18,508 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-14,5,main] java.lang.OutOfMemoryError: Java heap space at sun.nio.ch.Util.releaseTemporaryDirectBuffer(Util.java:67) at sun.nio.ch.IOUtil.read(IOUtil.java:212) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:176) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) at java.io.InputStream.read(InputStream.java:85) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:64) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:70) ERROR [COMPACTION-POOL:1] 2010-04-19 21:23:18,514 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:582) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.OutOfMemoryError: Java heap space INFO [FLUSH-WRITER-POOL:1] 2010-04-19 21:23:25,600 Memtable.java (line 162) Completed flushing /m/cassandra/data/Keyspace1/Standard1-623-Data.db ERROR [Thread-13] 2010-04-19 21:23:18,514 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-13,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-15] 2010-04-19 21:23:18,514 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-15,5,main] java.lang.OutOfMemoryError: Java heap space ERROR
Re: Help with MapReduce
And when retrieving only one supercolumn? Can I further specify which subcolumns to retrieve? On Mon, Apr 19, 2010 at 9:29 PM, Jonathan Ellis jbel...@gmail.com wrote: the latter, if you are retrieving multiple supercolumns. On Mon, Apr 19, 2010 at 8:10 PM, Joost Ouwerkerk jo...@openplaces.org wrote: hmm, might be too much data. In the case of a supercolumn, how do I specify which sub-columns to retrieve? Or can I only retrieve entire supercolumns? On Mon, Apr 19, 2010 at 8:47 PM, Jonathan Ellis jbel...@gmail.com wrote: Possibly you are asking it to retrieve too many columns per row. Possibly there is something else causing poor performance, like swapping. On Mon, Apr 19, 2010 at 7:12 PM, Joost Ouwerkerk jo...@openplaces.org wrote: I'm slowly getting somewhere with Cassandra... I have successfully imported 1.5 million rows using MapReduce. This took about 8 minutes on an 8-node cluster, which is comparable to the time it takes with HBase. Now I'm having trouble scanning this data. I've created a simple MapReduce job that counts rows in my ColumnFamily. The Job fails with most tasks throwing the following Exception. Anyone have any ideas what's going wrong? java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) ... 11 more On Sun, Apr 18, 2010 at 6:01 PM, Stu Hood stu.h...@rackspace.com wrote: In 0.6.0 and trunk, it is located at src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java You might be using a pre-release version of 0.6 if you are seeing a fat client based InputFormat. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 4:53pm To: user@cassandra.apache.org Subject: Re: Help with MapReduce Where is the ColumnFamilyInputFormat that uses Thrift? I don't actually have a preference about client, I just want to be consistent with ColumnInputFormat. On Sun, Apr 18, 2010 at 5:37 PM, Stu Hood stu.h...@rackspace.com wrote: ColumnFamilyInputFormat no longer uses the fat client API, and instead uses Thrift. There are still some significant problems with the fat client, so it shouldn't be used without a good understanding of those problems. If you still want to use it, check out contrib/bmt_example, but I'd recommend that you use thrift for now. -Original Message- From: Joost Ouwerkerk jo...@openplaces.org Sent: Sunday, April 18, 2010 2:59pm To: user@cassandra.apache.org Subject: Help with MapReduce I'm a Cassandra noob trying to validate Cassandra as a viable alternative to HBase (which we've been using for over a year) for our application. So far, I've had no success getting Cassandra working with MapReduce. My first step is inserting data into Cassandra. I've created a MapRed job based using the fat client API. I'm using the fat client (StorageProxy) because that's what ColumnFamilyInputFormat uses and I want to use the same API for both read and write jobs. When I call StorageProxy.mutate(), nothing happens. The job completes as if it had done something, but in fact nothing has changed in the
Re: 0.6.1 insert 1B rows, crashed when using py_stress
On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). Right, that is the crux of the problem It will be addressed here: https://issues.apache.org/jira/browse/CASSANDRA-685 -Brandon
Re: 0.6.1 insert 1B rows, crashed when using py_stress
I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to Cassandra is reasonable I think. Can this problem be resolved by tuning the thresholds described on this pagehttp://wiki.apache.org/cassandra/MemtableThresholds , or just be waiting for the 0.7 release as Brandon mentioned? On Tue, Apr 20, 2010 at 10:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Schubert, I don't know if you saw this in the other thread referencing your slides: It looks like the slowdown doesn't hit until after several GCs, although it's hard to tell since the scale is different on the GC graph and the insert throughput ones. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: -Xmx1G is too small. In my cluster, 8GB ram on each node, and I grant 6GB to cassandra. Please see my test @ http://www.slideshare.net/schubertzhang/presentations –Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software! References: –http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts –https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19) In fact, always when I using java-based infrastructure software, such as Cassandra, Hadoop, HBase, etc, I am also pained about such memory/GC issue finally. Then, we should provide higher harware with more RAM (such as 32GB~64GB), more CPU cores (such as 8~16). And we still cannot control the Out-Of-Memory-Error. I am thinking, maybe it is not right to leave the job of memory control to JVM. I have a long experience in telecom and embedded software in past ten years, where need robust programs and small RAM. I want to discuss following ideas with the community: 1. Manage the memory by ourselves: allocate objects/resource (memory) at initiating phase, and assign instances at runtime. 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). 3. I know, it is not easy in java program. Schubert On Tue, Apr 20, 2010 at 9:40 AM, Ken Sandney bluefl...@gmail.com wrote: here is my JVM options, by default, I didn't modify them, from cassandra.in.sh # Arguments to pass to the JVM JVM_OPTS= \ -ea \ -Xms128M \ -Xmx1G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false and my box is normal pc with 2GB ram, Intel E3200 @ 2.40GHz. By the way, I am using the latest Sun JDK On Tue, Apr 20, 2010 at 9:33 AM, Schubert Zhang zson...@gmail.com wrote: Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) ERROR [ROW-MUTATION-STAGE:9] 2010-04-19 20:43:27,932 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:9,5,main] java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:893) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:192) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:118) at
Re: busy thread on IncomingStreamReader ?
Ouch ! I talk too early ! We still suffer same problems after upgrade to 1.6.0_20. In JMX StreamingService, I see several wired incoming/outgoing transfer: In Host A, 192.168.2.87 StreamingService Status: Done with transfer to /192.168.2.88 StreamingService StreamSources: [/192.168.2.88] StreamingService StreamDestinations: [/192.168.2.88] StreamingService getIncomingFiles=192.168.2.88 [ UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Index.db 0/5718, UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Filter.db 0/325, UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Data.db 0/29831, UserState: /var/lib/cassandra/data/UserState/csArena-tmp-13-Index.db 0/47623, ... omit several 0 received pending files. UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-19-Data.db 0/355041, UserState: /var/lib/cassandra/data/UserState/mahjong-tmp-12-Data.db 27711/2173906, UserState: /var/lib/cassandra/data/UserState/darkChess-tmp-12-Data.db 27711/18821998, UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-6-Data.db 27711/743037, UserState: /var/lib/cassandra/data/UserState/big2-tmp-12-Index.db 27711/189214, UserState: /var/lib/cassandra/data/UserState/facebookPoker99-tmp-6-Data.db 27711/1892375, UserState: /var/lib/cassandra/data/UserState/facebookPoker99-tmp-6-Index.db 27711/143216, UserState: /var/lib/cassandra/data/UserState/csArena-tmp-6-Data.db 27711/201188, UserState: /var/lib/cassandra/data/UserState/darkChess-tmp-12-Index.db 27711/354923, UserState: /var/lib/cassandra/data/UserState/big2-tmp-12-Data.db 27711/1260768, UserState: /var/lib/cassandra/data/UserState/mahjong-tmp-12-Index.db 27711/332649, UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-6-Index.db 27711/39739 ] lots of files stalled after receiving 27711 bytes. this strange number is the length of first file to income, see Host B Host B, 192.168.2.88 StreamingService Status: Receiving stream StreamingService StreamSources: StreamSources: [/192.168.2.87] StreamingService StreamDestinations: [/192.168.2.87] StreamingService getOutgoingFiles=192.168.2.87 [ /var/lib/cassandra/data/UserState/stream/csArena-6-Index.db 27711/27711, /var/lib/cassandra/data/UserState/stream/csArena-6-Filter.db 0/1165, /var/lib/cassandra/data/UserState/stream/csArena-6-Data.db 0/201188, ... omit pending outgoing files ] It seems that outgoing files does not terminate properly. and cause the receiver goes into infinite loop to cause busy thread. From thread dump, it looks like fc.transferFrom() in IncomingStreamReader never return: while (bytesRead pendingFile.getExpectedBytes()) { bytesRead += fc.transferFrom(socketChannel, bytesRead, FileStreamTask.CHUNK_SIZE); pendingFile.update(bytesRead); } On Tue, Apr 20, 2010 at 05:48, Rob Coli rc...@digg.com wrote: On 4/17/10 6:47 PM, Ingram Chen wrote: after upgrading jdk from 1.6.0_16 to 1.6.0_20, the problem solved. FYI, this sounds like it might be : https://issues.apache.org/jira/browse/CASSANDRA-896 http://bugs.sun.com/view_bug.do;jsessionid=60c39aa55d3666c0c84dd70eb826?bug_id=6805775 Where garbage collection issues in JVM/JDKs before 7.b70 leads to GC storming which hoses performance. =Rob
Re: 0.6.1 insert 1B rows, crashed when using py_stress
Ken, I linked you to the FAQ answering your problem in the first reply you got. Please don't hijack my replies to other people; that's rude. On Mon, Apr 19, 2010 at 9:32 PM, Ken Sandney bluefl...@gmail.com wrote: I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to Cassandra is reasonable I think. Can this problem be resolved by tuning the thresholds described on this page , or just be waiting for the 0.7 release as Brandon mentioned? On Tue, Apr 20, 2010 at 10:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Schubert, I don't know if you saw this in the other thread referencing your slides: It looks like the slowdown doesn't hit until after several GCs, although it's hard to tell since the scale is different on the GC graph and the insert throughput ones. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: -Xmx1G is too small. In my cluster, 8GB ram on each node, and I grant 6GB to cassandra. Please see my test @ http://www.slideshare.net/schubertzhang/presentations –Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software! References: –http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts –https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19) In fact, always when I using java-based infrastructure software, such as Cassandra, Hadoop, HBase, etc, I am also pained about such memory/GC issue finally. Then, we should provide higher harware with more RAM (such as 32GB~64GB), more CPU cores (such as 8~16). And we still cannot control the Out-Of-Memory-Error. I am thinking, maybe it is not right to leave the job of memory control to JVM. I have a long experience in telecom and embedded software in past ten years, where need robust programs and small RAM. I want to discuss following ideas with the community: 1. Manage the memory by ourselves: allocate objects/resource (memory) at initiating phase, and assign instances at runtime. 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). 3. I know, it is not easy in java program. Schubert On Tue, Apr 20, 2010 at 9:40 AM, Ken Sandney bluefl...@gmail.com wrote: here is my JVM options, by default, I didn't modify them, from cassandra.in.sh # Arguments to pass to the JVM JVM_OPTS= \ -ea \ -Xms128M \ -Xmx1G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false and my box is normal pc with 2GB ram, Intel E3200 @ 2.40GHz. By the way, I am using the latest Sun JDK On Tue, Apr 20, 2010 at 9:33 AM, Schubert Zhang zson...@gmail.com wrote: Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) ERROR [ROW-MUTATION-STAGE:9] 2010-04-19 20:43:27,932 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:9,5,main] java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:893) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) at
Re: Clarification on Ring operations in Cassandra 0.5.1
You can have a look at org.apache.cassandra.service.StorageService public void initServer() throws IOException 1. If AutoBootstrap=false, it means the the node is bootstaped (not a new node) Usually, the first new node is set false. (1) check the system table to find the saved token, if found use it, otherwise, (2) check config of InitialToken, if configured use it, otherwise, (3) getRandomToken Please refer org.apache.cassandra.service.StorageService public void initServer() throws IOException and org.apache.cassandra.db.SystemTable public static synchronized StorageMetadata initMetadata() throws IOException 2. If AutoBootstrap=true, it means the the node is a new node. Usually, the other new node set AutoBootstrap=true. (1) If the seed include this node itself, go above 1. otherwise, (2) If the node is already boodstraped (check system table), go above 1. otherwise, (3) Get load information of other nodes via Gossip, wait long. (4) If InitialTokenis configured, use it. otherwise, (5) Find the node token with most heavy load. I my use case, I usually always configure InitialToken for new node for a new cluster, then, I can get good load-balance. But when adding a new node to a running cluster (with many data), I let cassandra to find the token via load-checking. Schubert On Tue, Apr 20, 2010 at 7:48 AM, Anthony Molinaro antho...@alumni.caltech.edu wrote: On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote: Can I then 'nodeprobe move token for range I want to take over', and achieve the same as step 2 above? You can't have two nodes with the same token in the ring at once. So, you can removetoken the old node first, then bootstrap the new one (just specify InitialToken in the config to avoid having it guess one), or you can make it a 3 step process (bootstrap, remove, move) to avoid transferring so much data around. So I'm still a little fuzzy for your 3 step case on why less data moves, but let me run through the two scenarios and see where we get. Please correct me if I'm wrong on some point. Let say I have 3 nodes with random partitioner and rack unaware strategy. Which means I have something like Node Size Token KeyRange (self + next in ring) - -- A 5 G 331 - 66 B 6 G 66 34 - 0 C 2 G 0 67 - 33 Now lets say Node B is giving us some problems, so we want to replace it with another node D. We've outlined 2 processes. In the first process you recommend 1. removetoken on node B 2. wait for data to move 3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml then start it 4. wait for data to move So when you do the removetoken, this will cause the following transfers at stage 2 Node A sends 34-66 to Node C Node C sends 67-0 to Node A at stage 4 Node A sends 34-66 to Node D Node C sends 67-0 to Node D In the second process I assume you pick a token really close to another token? 1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml then start it 2. wait for data to move 3. removetoken on node B 4. wait for data to move 5. movetoken on node D to 66 6. wait for data to move This results in the following moves at stage 2 Node A/B sends 33-34 to Node D (primary token range) Node B sends 34-66 to Node D (replica range) at stage 4 Node C sends 66-0 to Node D (replica range) at stage 6 No data movement as D already had 33-0 So seems like you move all the data twice for process 1 and only a small portion twice for process 2 (which is what you said, so hopefully I've outlined correctly what is happening). Does all that sound right? Once I've run bootstrap with the InitialToken value set in the config is it then ignored in subsequent restarts, and if so can I just remove it after that first time? Thanks, -Anthony -- Anthony Molinaro antho...@alumni.caltech.edu
Re: 0.6.1 insert 1B rows, crashed when using py_stress
Sorry I just don't know how to resolve this :) On Tue, Apr 20, 2010 at 10:37 AM, Jonathan Ellis jbel...@gmail.com wrote: Ken, I linked you to the FAQ answering your problem in the first reply you got. Please don't hijack my replies to other people; that's rude. On Mon, Apr 19, 2010 at 9:32 PM, Ken Sandney bluefl...@gmail.com wrote: I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to Cassandra is reasonable I think. Can this problem be resolved by tuning the thresholds described on this page , or just be waiting for the 0.7 release as Brandon mentioned? On Tue, Apr 20, 2010 at 10:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Schubert, I don't know if you saw this in the other thread referencing your slides: It looks like the slowdown doesn't hit until after several GCs, although it's hard to tell since the scale is different on the GC graph and the insert throughput ones. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: -Xmx1G is too small. In my cluster, 8GB ram on each node, and I grant 6GB to cassandra. Please see my test @ http://www.slideshare.net/schubertzhang/presentations –Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software! References: –http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts –https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19) In fact, always when I using java-based infrastructure software, such as Cassandra, Hadoop, HBase, etc, I am also pained about such memory/GC issue finally. Then, we should provide higher harware with more RAM (such as 32GB~64GB), more CPU cores (such as 8~16). And we still cannot control the Out-Of-Memory-Error. I am thinking, maybe it is not right to leave the job of memory control to JVM. I have a long experience in telecom and embedded software in past ten years, where need robust programs and small RAM. I want to discuss following ideas with the community: 1. Manage the memory by ourselves: allocate objects/resource (memory) at initiating phase, and assign instances at runtime. 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). 3. I know, it is not easy in java program. Schubert On Tue, Apr 20, 2010 at 9:40 AM, Ken Sandney bluefl...@gmail.com wrote: here is my JVM options, by default, I didn't modify them, from cassandra.in.sh # Arguments to pass to the JVM JVM_OPTS= \ -ea \ -Xms128M \ -Xmx1G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false and my box is normal pc with 2GB ram, Intel E3200 @ 2.40GHz. By the way, I am using the latest Sun JDK On Tue, Apr 20, 2010 at 9:33 AM, Schubert Zhang zson...@gmail.com wrote: Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) ERROR [ROW-MUTATION-STAGE:9] 2010-04-19 20:43:27,932 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:9,5,main] java.lang.OutOfMemoryError: Java heap
Re: busy thread on IncomingStreamReader ?
I don't see csArena-tmp-6-Index.db in the incoming files list. If it's not there, that means that it did break out of that while loop. Did you check both logs for exceptions? On Mon, Apr 19, 2010 at 9:36 PM, Ingram Chen ingramc...@gmail.com wrote: Ouch ! I talk too early ! We still suffer same problems after upgrade to 1.6.0_20. In JMX StreamingService, I see several wired incoming/outgoing transfer: In Host A, 192.168.2.87 StreamingService Status: Done with transfer to /192.168.2.88 StreamingService StreamSources: [/192.168.2.88] StreamingService StreamDestinations: [/192.168.2.88] StreamingService getIncomingFiles=192.168.2.88 [ UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Index.db 0/5718, UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Filter.db 0/325, UserState: /var/lib/cassandra/data/UserState/multiMine-tmp-11-Data.db 0/29831, UserState: /var/lib/cassandra/data/UserState/csArena-tmp-13-Index.db 0/47623, ... omit several 0 received pending files. UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-19-Data.db 0/355041, UserState: /var/lib/cassandra/data/UserState/mahjong-tmp-12-Data.db 27711/2173906, UserState: /var/lib/cassandra/data/UserState/darkChess-tmp-12-Data.db 27711/18821998, UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-6-Data.db 27711/743037, UserState: /var/lib/cassandra/data/UserState/big2-tmp-12-Index.db 27711/189214, UserState: /var/lib/cassandra/data/UserState/facebookPoker99-tmp-6-Data.db 27711/1892375, UserState: /var/lib/cassandra/data/UserState/facebookPoker99-tmp-6-Index.db 27711/143216, UserState: /var/lib/cassandra/data/UserState/csArena-tmp-6-Data.db 27711/201188, UserState: /var/lib/cassandra/data/UserState/darkChess-tmp-12-Index.db 27711/354923, UserState: /var/lib/cassandra/data/UserState/big2-tmp-12-Data.db 27711/1260768, UserState: /var/lib/cassandra/data/UserState/mahjong-tmp-12-Index.db 27711/332649, UserState: /var/lib/cassandra/data/UserState/battleCity2-tmp-6-Index.db 27711/39739 ] lots of files stalled after receiving 27711 bytes. this strange number is the length of first file to income, see Host B Host B, 192.168.2.88 StreamingService Status: Receiving stream StreamingService StreamSources: StreamSources: [/192.168.2.87] StreamingService StreamDestinations: [/192.168.2.87] StreamingService getOutgoingFiles=192.168.2.87 [ /var/lib/cassandra/data/UserState/stream/csArena-6-Index.db 27711/27711, /var/lib/cassandra/data/UserState/stream/csArena-6-Filter.db 0/1165, /var/lib/cassandra/data/UserState/stream/csArena-6-Data.db 0/201188, ... omit pending outgoing files ] It seems that outgoing files does not terminate properly. and cause the receiver goes into infinite loop to cause busy thread. From thread dump, it looks like fc.transferFrom() in IncomingStreamReader never return: while (bytesRead pendingFile.getExpectedBytes()) { bytesRead += fc.transferFrom(socketChannel, bytesRead, FileStreamTask.CHUNK_SIZE); pendingFile.update(bytesRead); } On Tue, Apr 20, 2010 at 05:48, Rob Coli rc...@digg.com wrote: On 4/17/10 6:47 PM, Ingram Chen wrote: after upgrading jdk from 1.6.0_16 to 1.6.0_20, the problem solved. FYI, this sounds like it might be : https://issues.apache.org/jira/browse/CASSANDRA-896 http://bugs.sun.com/view_bug.do;jsessionid=60c39aa55d3666c0c84dd70eb826?bug_id=6805775 Where garbage collection issues in JVM/JDKs before 7.b70 leads to GC storming which hoses performance. =Rob
Re: 0.6.1 insert 1B rows, crashed when using py_stress
Jonathan, Thanks. Yes, the scale of GC grath is different from the throughput one. I will do more check and tuning in our next test immediately. On Tue, Apr 20, 2010 at 10:39 AM, Ken Sandney bluefl...@gmail.com wrote: Sorry I just don't know how to resolve this :) On Tue, Apr 20, 2010 at 10:37 AM, Jonathan Ellis jbel...@gmail.comwrote: Ken, I linked you to the FAQ answering your problem in the first reply you got. Please don't hijack my replies to other people; that's rude. On Mon, Apr 19, 2010 at 9:32 PM, Ken Sandney bluefl...@gmail.com wrote: I am just running Cassandra on normal boxes, and grants 1GB of total 2GB to Cassandra is reasonable I think. Can this problem be resolved by tuning the thresholds described on this page , or just be waiting for the 0.7 release as Brandon mentioned? On Tue, Apr 20, 2010 at 10:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Schubert, I don't know if you saw this in the other thread referencing your slides: It looks like the slowdown doesn't hit until after several GCs, although it's hard to tell since the scale is different on the GC graph and the insert throughput ones. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: -Xmx1G is too small. In my cluster, 8GB ram on each node, and I grant 6GB to cassandra. Please see my test @ http://www.slideshare.net/schubertzhang/presentations –Memory, GC..., always to be the bottleneck and big issue of java-based infrastructure software! References: –http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts –https://issues.apache.org/jira/browse/CASSANDRA-896 (LinkedBlockingQueue issue, fixed in jdk-6u19) In fact, always when I using java-based infrastructure software, such as Cassandra, Hadoop, HBase, etc, I am also pained about such memory/GC issue finally. Then, we should provide higher harware with more RAM (such as 32GB~64GB), more CPU cores (such as 8~16). And we still cannot control the Out-Of-Memory-Error. I am thinking, maybe it is not right to leave the job of memory control to JVM. I have a long experience in telecom and embedded software in past ten years, where need robust programs and small RAM. I want to discuss following ideas with the community: 1. Manage the memory by ourselves: allocate objects/resource (memory) at initiating phase, and assign instances at runtime. 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). 3. I know, it is not easy in java program. Schubert On Tue, Apr 20, 2010 at 9:40 AM, Ken Sandney bluefl...@gmail.com wrote: here is my JVM options, by default, I didn't modify them, from cassandra.in.sh # Arguments to pass to the JVM JVM_OPTS= \ -ea \ -Xms128M \ -Xmx1G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8080 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false and my box is normal pc with 2GB ram, Intel E3200 @ 2.40GHz. By the way, I am using the latest Sun JDK On Tue, Apr 20, 2010 at 9:33 AM, Schubert Zhang zson...@gmail.com wrote: Seems you should configure larger jvm-heap. On Tue, Apr 20, 2010 at 9:32 AM, Schubert Zhang zson...@gmail.com wrote: Please also post your jvm-heap and GC options, i.e. the seting in cassandra.in.sh And what about you node hardware? On Tue, Apr 20, 2010 at 9:22 AM, Ken Sandney bluefl...@gmail.com wrote: Hi I am doing a insert test with 9 nodes, the command: stress.py -n 10 -t 1000 -c 10 -o insert -i 5 -d 10.0.0.1,10.0.0.2. and 5 of the 9 nodes were cashed, only about 6'500'000 rows were inserted I checked out the system.log and seems the reason are 'out of memory'. I don't if this had something to do with my settings. Any idea about this? Thank you, and the following are the errors from system.log ERROR [CACHETABLE-TIMER-1] 2010-04-19 20:43:14,013 CassandraDaemon.java (line 78) Fatal exception in thread Thread[CACHETABLE-TIMER-1,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.ExpiringMap$CacheMonitor.run(ExpiringMap.java:76) at java.util.TimerThread.mainLoop(Timer.java:512) at
Re: tcp CLOSE_WAIT bug
this happened after several hours of operations and both nodes are started at the same time (clean start without any data). so it might not relate to Bootstrap. In system.log I do not see any logs like xxx node dead or exceptions. and both nodes in test are alive. they serve read/write well, too. Below four connections between nodes are keep healthy from time to time. tcp0 0 :::192.168.2.87:7000:::192.168.2.88:58447 ESTABLISHED tcp0 0 :::192.168.2.87:54986 :::192.168.2.88:7000 ESTABLISHED tcp0 0 :::192.168.2.87:59138 :::192.168.2.88:7000 ESTABLISHED tcp0 0 :::192.168.2.87:7000:::192.168.2.88:39074 ESTABLISHED so connections end in CLOSE_WAIT should be newly created. (for streaming ?) This seems related to streaming issues we suffered recently: http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html I would like add some debug codes around opening and closing of socket to find out what happend. Could you give me some hint, about what classes I should take look ? On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis jbel...@gmail.com wrote: Is this after doing a bootstrap or other streaming operation? Or did a node go down? The internal sockets are supposed to remain open, otherwise. On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen ingramc...@gmail.com wrote: Thank your information. We do use connection pools with thrift client and ThriftAdress is on port 9160. Those problematic connections we found are all in port 7000, which is internal communications port between nodes. I guess this related to StreamingService. On Mon, Apr 19, 2010 at 23:46, Brandon Williams dri...@gmail.com wrote: On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen ingramc...@gmail.com wrote: Hi all, We have observed several connections between nodes in CLOSE_WAIT after several hours of operation: This is symptomatic of not pooling your client connections correctly. Be sure you're using one connection per thread, not one connection per operation. -Brandon -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen
Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1
Since the scale of GC graph in the slides is different from the throughput ones. I will do another test for this issue. Thanks for your advices, Masood and Jonathan. --- Here, i just post my cossandra.in.sh. JVM_OPTS= \ -ea \ -Xms128M \ -Xmx6G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8081 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false On Tue, Apr 20, 2010 at 5:46 AM, Masood Mortazavi masoodmortaz...@gmail.com wrote: Minimizing GC pauses or minimizing time slots allocated to GC pauses -- either through configuration or re-implementations of garbage collection bottlenecks (i.e. object-generation bottlenecks) -- seem to be the immediate approach. (Other approaches appear to be more intrusive.) At code level, using the GC logs, one can investigate further. There may be places were some object recycling can make some larger difference. Trying this first will probably bear more immediate fruit. - m. On Mon, Apr 19, 2010 at 9:11 AM, Daniel Kluesing d...@bluekai.com wrote: We see this behavior as well with 0.6, heap usage graphs look almost identical. The GC is a noticeable bottleneck, we’ve tried jdku19 and jrockit vm’s. It basically kills any kind of soft real time behavior. *From:* Masood Mortazavi [mailto:masoodmortaz...@gmail.com] *Sent:* Monday, April 19, 2010 4:15 AM *To:* user@cassandra.apache.org; d...@cassandra.apache.org *Subject:* 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com
Re: why read operation use so much of memory?
On Mon, Apr 19, 2010 at 10:28 PM, dir dir sikerasa...@gmail.com wrote: Hi Jonathan, I see this page (http://wiki.apache.org/cassandra/SSTableMemtable) does not exist yet. I think he meant: http://wiki.apache.org/cassandra/MemtableSSTable -Brandon
Re: Help with MapReduce
Ok. This should be ok for now, although not optimal for some jobs. Next issue is node stability during the insert job. The stacktrace below occured on several nodes while inserting 10 million rows. We're running on 4G machines, 1G of which is allocated to cassandra. What's the best config to prevent OOMs (even if it means sacrificing some performance)? ERROR [COMPACTION-POOL:1] 2010-04-20 01:39:15,853 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretaskjava.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:582) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786)at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.cassandra.db.ColumnSerializer.writeName(ColumnSerializer.java:39) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:301) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:284) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:99) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:131) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:41) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:284) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:102) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:83) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more On Mon, Apr 19, 2010 at 10:34 PM, Jonathan Ellis jbel...@gmail.com wrote: Oh, from Hadoop. Yes, you are indeed limited to entire columns or supercolumns at a time there.
Re: Help with MapReduce
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts On Tue, Apr 20, 2010 at 12:48 AM, Joost Ouwerkerk jo...@openplaces.org wrote: Ok. This should be ok for now, although not optimal for some jobs. Next issue is node stability during the insert job. The stacktrace below occured on several nodes while inserting 10 million rows. We're running on 4G machines, 1G of which is allocated to cassandra. What's the best config to prevent OOMs (even if it means sacrificing some performance)? ERROR [COMPACTION-POOL:1] 2010-04-20 01:39:15,853 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretaskjava.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:582) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.cassandra.db.ColumnSerializer.writeName(ColumnSerializer.java:39) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:301) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:284) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:99) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:131) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:41) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:284) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:102) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:83) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more On Mon, Apr 19, 2010 at 10:34 PM, Jonathan Ellis jbel...@gmail.com wrote: Oh, from Hadoop. Yes, you are indeed limited to entire columns or supercolumns at a time there.