Re: Invalid argument
Hi Aaron. Here is my java -version java version 1.6.0_35 Java(TM) SE Runtime Environment (build 1.6.0_35-b10) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) Thanks for the work around, setting disk_access_mode: standard worked. Alain 2012/11/19 aaron morton aa...@thelastpickle.com Are you running a 32 bit JVM ? What is the full JVM version ? As a work around you can try disabling memory mapped access set disk_access_mode to standard. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/11/2012, at 6:27 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I have backed up production sstables from one of my 3 production nodes (RF=3) and I want to use them on my dev environment.(C* 1.1.6 on both environments) My dev server is a 4 core, 4 GB RAM hardware runing on ubuntu. I have applied the production schema in my dev node and copied all sstable in the appropriated folder and restart my node like I always do. But this time have had the following error (many times and only for ) : INFO [SSTableBatchOpen:4] 2012-11-19 17:52:52,980 SSTableReader.java (line 169) Opening /var/lib/cassandra/data/cassa_teads/data_action/cassa_teads-data_action-hf-660 (7015417424 bytes) ERROR [SSTableBatchOpen:3] 2012-11-19 17:53:17,259 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[SSTableBatchOpen:3,5,main] java.io.IOError: java.io.IOException: Invalid argument at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:242) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Invalid argument at sun.nio.ch.FileChannelImpl.truncate0(Native Method) at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:194) ... 11 more If I try with nodetool refresh I have the following error : Exception in thread main java.io.IOError: java.io.IOException: Invalid argument at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:510) at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:468) at org.apache.cassandra.service.StorageService.loadNewSSTables(StorageService.java:3089) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source) at
Re: Looking for a good Ruby client
As the author of Cequel, I can assure you it is excellent ; ) We use it in production at Brewster and it is quite stable. If you try it out and find any bugs, we'll fix 'em quickly. I'm planning a big overhaul of the model layer over the holidays to expose all the new data modeling goodness in CQL3 (while still retaining compatibility with CQL2 structures). On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com wrote: Update on this: someone just pointed me towards the Cequel gem: https://github.com/brewster/cequel The way it's described in the readme it looks like exactly what I was looking for - a modern, CQL-based gem that is in active development and also follows the ActiveModel pattern. I'd be very interested to hear if anybody has used this, whether it's stable/reliable, etc. Thanks. Harry On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote: Harry, we're in a similar situation and are starting to work out our own ruby client. The biggest issue is that it doesn't make much sense to build a higher level abstraction on anything other than CQL3, given where things are headed. At least this is our opinion. At the same time, CQL3 is just barely becoming usable and still seems rather deficient in wide-row usage. The tricky part is that with the current CQL3 you have to construct quite complex iterators to retrieve a large result set. Which means that you end up having to either parse CQL3 coming in to insert the iteration stuff, or you have to pass CQL3 fragments in and compose them together with iterator clauses. Not fun stuff either way. The only good solution I see is to switch to a streaming protocol (or build some form of continue on top of thrift) such that the client can ask for a huge result set and the cassandra coordinator can break it into sub-queries as it sees fit and return results chunk-by-chunk. If this is really the path forward then all abstractions built above CQL3 before that will either have a good piece of complex code that can be deleted or worse, will have an interface that is no longer best practice. Good luck! Thorsten On 8/1/2012 1:47 PM, Harry Wilkinson wrote: Hi, I'm looking for a Ruby client for Cassandra that is pretty high-level. I am really hoping to find a Ruby gem of high quality that allows a developer to create models like you would with ActiveModel. So far I have figured out that the canonical Ruby client for Cassandra is Twitter's Cassandra gem of the same name. It looks great - mature, still in active development, etc. No stated support for Ruby 1.9.3 that I can see, but I can probably live with that for now. What I'm looking for is a higher-level gem built on that gem that works like ActiveModel in that you just include a module in your model class and that gives you methods to declare your model's serialized attributes and also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc. I've been trying out some different NoSQL databases recently, and for example there is an official Ruby client for Riak with a domain model that is close to Riak's, but then there's also a gem called 'Ripple' that uses a domain model that is closer to what most Ruby developers are used to. So it looks like Twitter's Cassandra gem is the one that stays close to the domain model of Cassandra, and what I'm looking for is a gem that's a Cassandra equivalent of RIpple. From some searching I found cassandra_object, which has been inactive for a couple of years, but there's a fork that looks like it's being maintained, but I have not found any kind of information to suggest the maintained fork is in general use yet. I have found quite a lot of gems of a similar style that people have started and then not really got very far with. So, does anybody know of a suitable gem? Would you recommend it? Or perhaps you would recommend not using such a gem and sticking with the lower-level client gem? Thanks in advance for your advice. Harry
Re: Looking for a good Ruby client
@Mat Well I guess you could add your Ruby client to this list since there is not a lot of them yet. http://wiki.apache.org/cassandra/ClientOptions Alain 2012/11/20 Mat Brown m...@brewster.com As the author of Cequel, I can assure you it is excellent ; ) We use it in production at Brewster and it is quite stable. If you try it out and find any bugs, we'll fix 'em quickly. I'm planning a big overhaul of the model layer over the holidays to expose all the new data modeling goodness in CQL3 (while still retaining compatibility with CQL2 structures). On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com wrote: Update on this: someone just pointed me towards the Cequel gem: https://github.com/brewster/cequel The way it's described in the readme it looks like exactly what I was looking for - a modern, CQL-based gem that is in active development and also follows the ActiveModel pattern. I'd be very interested to hear if anybody has used this, whether it's stable/reliable, etc. Thanks. Harry On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote: Harry, we're in a similar situation and are starting to work out our own ruby client. The biggest issue is that it doesn't make much sense to build a higher level abstraction on anything other than CQL3, given where things are headed. At least this is our opinion. At the same time, CQL3 is just barely becoming usable and still seems rather deficient in wide-row usage. The tricky part is that with the current CQL3 you have to construct quite complex iterators to retrieve a large result set. Which means that you end up having to either parse CQL3 coming in to insert the iteration stuff, or you have to pass CQL3 fragments in and compose them together with iterator clauses. Not fun stuff either way. The only good solution I see is to switch to a streaming protocol (or build some form of continue on top of thrift) such that the client can ask for a huge result set and the cassandra coordinator can break it into sub-queries as it sees fit and return results chunk-by-chunk. If this is really the path forward then all abstractions built above CQL3 before that will either have a good piece of complex code that can be deleted or worse, will have an interface that is no longer best practice. Good luck! Thorsten On 8/1/2012 1:47 PM, Harry Wilkinson wrote: Hi, I'm looking for a Ruby client for Cassandra that is pretty high-level. I am really hoping to find a Ruby gem of high quality that allows a developer to create models like you would with ActiveModel. So far I have figured out that the canonical Ruby client for Cassandra is Twitter's Cassandra gem of the same name. It looks great - mature, still in active development, etc. No stated support for Ruby 1.9.3 that I can see, but I can probably live with that for now. What I'm looking for is a higher-level gem built on that gem that works like ActiveModel in that you just include a module in your model class and that gives you methods to declare your model's serialized attributes and also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc. I've been trying out some different NoSQL databases recently, and for example there is an official Ruby client for Riak with a domain model that is close to Riak's, but then there's also a gem called 'Ripple' that uses a domain model that is closer to what most Ruby developers are used to. So it looks like Twitter's Cassandra gem is the one that stays close to the domain model of Cassandra, and what I'm looking for is a gem that's a Cassandra equivalent of RIpple. From some searching I found cassandra_object, which has been inactive for a couple of years, but there's a fork that looks like it's being maintained, but I have not found any kind of information to suggest the maintained fork is in general use yet. I have found quite a lot of gems of a similar style that people have started and then not really got very far with. So, does anybody know of a suitable gem? Would you recommend it? Or perhaps you would recommend not using such a gem and sticking with the lower-level client gem? Thanks in advance for your advice. Harry
Re: Looking for a good Ruby client
@Mat Brown: (while still retaining compatibility with CQL2 structures). Do you mean by exceeding what Cassandra itself provides in terms of CQL2/3 interoperability? I'm looking into something similar currently (however in Java not in Ruby) and would be interested in your experiences, if you follow through with the plan. Do you have a blog? Thanks! 2012/11/20 Alain RODRIGUEZ arodr...@gmail.com @Mat Well I guess you could add your Ruby client to this list since there is not a lot of them yet. http://wiki.apache.org/cassandra/ClientOptions Alain 2012/11/20 Mat Brown m...@brewster.com As the author of Cequel, I can assure you it is excellent ; ) We use it in production at Brewster and it is quite stable. If you try it out and find any bugs, we'll fix 'em quickly. I'm planning a big overhaul of the model layer over the holidays to expose all the new data modeling goodness in CQL3 (while still retaining compatibility with CQL2 structures). On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com wrote: Update on this: someone just pointed me towards the Cequel gem: https://github.com/brewster/cequel The way it's described in the readme it looks like exactly what I was looking for - a modern, CQL-based gem that is in active development and also follows the ActiveModel pattern. I'd be very interested to hear if anybody has used this, whether it's stable/reliable, etc. Thanks. Harry On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote: Harry, we're in a similar situation and are starting to work out our own ruby client. The biggest issue is that it doesn't make much sense to build a higher level abstraction on anything other than CQL3, given where things are headed. At least this is our opinion. At the same time, CQL3 is just barely becoming usable and still seems rather deficient in wide-row usage. The tricky part is that with the current CQL3 you have to construct quite complex iterators to retrieve a large result set. Which means that you end up having to either parse CQL3 coming in to insert the iteration stuff, or you have to pass CQL3 fragments in and compose them together with iterator clauses. Not fun stuff either way. The only good solution I see is to switch to a streaming protocol (or build some form of continue on top of thrift) such that the client can ask for a huge result set and the cassandra coordinator can break it into sub-queries as it sees fit and return results chunk-by-chunk. If this is really the path forward then all abstractions built above CQL3 before that will either have a good piece of complex code that can be deleted or worse, will have an interface that is no longer best practice. Good luck! Thorsten On 8/1/2012 1:47 PM, Harry Wilkinson wrote: Hi, I'm looking for a Ruby client for Cassandra that is pretty high-level. I am really hoping to find a Ruby gem of high quality that allows a developer to create models like you would with ActiveModel. So far I have figured out that the canonical Ruby client for Cassandra is Twitter's Cassandra gem of the same name. It looks great - mature, still in active development, etc. No stated support for Ruby 1.9.3 that I can see, but I can probably live with that for now. What I'm looking for is a higher-level gem built on that gem that works like ActiveModel in that you just include a module in your model class and that gives you methods to declare your model's serialized attributes and also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc. I've been trying out some different NoSQL databases recently, and for example there is an official Ruby client for Riak with a domain model that is close to Riak's, but then there's also a gem called 'Ripple' that uses a domain model that is closer to what most Ruby developers are used to. So it looks like Twitter's Cassandra gem is the one that stays close to the domain model of Cassandra, and what I'm looking for is a gem that's a Cassandra equivalent of RIpple. From some searching I found cassandra_object, which has been inactive for a couple of years, but there's a fork that looks like it's being maintained, but I have not found any kind of information to suggest the maintained fork is in general use yet. I have found quite a lot of gems of a similar style that people have started and then not really got very far with. So, does anybody know of a suitable gem? Would you recommend it? Or perhaps you would recommend not using such a gem and sticking with the lower-level client gem? Thanks in advance for your advice. Harry
Re: Looking for a good Ruby client
Hi Timmy, I haven't done a lot of playing with CQL3 yet, mostly just reading the blog posts, so the following is subject to change : ) Right now, the Cequel model layer has a skinny row model (which is designed to follow common patterns of Ruby ORMs) and a wide row model (which is designed to behave more or less like a Hash, the equivalent of Java's HashMap). The two don't integrate with each other in any meaningful way, but as far as I understand it, they do pretty much cover the data modeling possibilities in CQL2. The big idea I've got for the overhaul of Cequel for CQL3 is to allow building a rich, nested data model by integrating different flavors of CQL3 table, most notably multi-column primary keys, as well as collections. The core data types I have in mind are: 1) Skinny row with simple primary key (e.g. blogs, with blog_id key) 2) Skinny row with complex primary key (e.g. blog_posts, with (blog_id, post_id) key) 3) Wide row with simple primary key (e.g. blog_languages -- kind of a weak example but i can't think of anything better for a blog : ) 4) Wide row with complex primary key (e.g. blog_post_tags) My goal is to make it easy to model one-one relationships via a shared primary key, and one-many via a shared prefix of the primary key. So, for instance, blogs and blog_languages rows would be one-one (both with a blog_id primary key) and blogs and blog_posts would be one-many (sharing the blog_id prefix in the primary key). From what I've read, it seems fairly clear that the actual CQL used to interact with #1 will be the same for CQL2 column families and CQL3 tables, so no explicit backward compatibility would be needed. #2 and #4 are, of course, CQL3-only, so backward compatibility isn't an issue there either. What I'm not entirely clear on is #3 -- this is straightforward in CQL2, and presumably a CQL3 table with compact storage would behave in the same way. However, my understanding so far is that a non-compact CQL3 table would treat this structure differently, in that both the key and value of the map would correspond to columns in a CQL3 table. It may make more sense to just target compact storage tables with this data structure, but I'm going to need to play around with it more to figure that out. Otherwise, Cequel will need to provide two flavors of that structure. There's also some tension between CQL3 collections and just using traditional wide-row structures to achieve the same thing. For instance, blog_tags could also just be a tags collection in the blogs table. My plan at this point is to offer both options, since each has its advantages (collections don't require the creation of a separate table; but a separate table gives you access to slices of the collection). Anyway, that's probably a lot more of an answer than you needed, but hopefully the context helps. Definitely interested to hear about the direction you take your client in as well. Finally, regarding a blog, we've got one set up, but it's not live yet. I'll ping you with a link when it is; I'll certainly be posting on the development of the next Cequel release. Cheers, Mat On Tue, Nov 20, 2012 at 9:23 AM, Timmy Turner timm.t...@gmail.com wrote: @Mat Brown: (while still retaining compatibility with CQL2 structures). Do you mean by exceeding what Cassandra itself provides in terms of CQL2/3 interoperability? I'm looking into something similar currently (however in Java not in Ruby) and would be interested in your experiences, if you follow through with the plan. Do you have a blog? Thanks! 2012/11/20 Alain RODRIGUEZ arodr...@gmail.com @Mat Well I guess you could add your Ruby client to this list since there is not a lot of them yet. http://wiki.apache.org/cassandra/ClientOptions Alain 2012/11/20 Mat Brown m...@brewster.com As the author of Cequel, I can assure you it is excellent ; ) We use it in production at Brewster and it is quite stable. If you try it out and find any bugs, we'll fix 'em quickly. I'm planning a big overhaul of the model layer over the holidays to expose all the new data modeling goodness in CQL3 (while still retaining compatibility with CQL2 structures). On Thu, Nov 15, 2012 at 3:42 PM, Harry Wilkinson hwilkin...@mdsol.com wrote: Update on this: someone just pointed me towards the Cequel gem: https://github.com/brewster/cequel The way it's described in the readme it looks like exactly what I was looking for - a modern, CQL-based gem that is in active development and also follows the ActiveModel pattern. I'd be very interested to hear if anybody has used this, whether it's stable/reliable, etc. Thanks. Harry On 2 August 2012 00:31, Thorsten von Eicken t...@rightscale.com wrote: Harry, we're in a similar situation and are starting to work out our own ruby client. The biggest issue is that it doesn't make much sense to build a higher level abstraction on anything other than CQL3, given where things are
Re: Upgrade 1.1.2 - 1.1.6
Alain, My understanding is that drain ensures that all memtables are flushed, so that there is no data in the commitlog that is isn't in an sstable. A marker is saved that indicates the commit logs should not be replayed. Commitlogs are only removed from disk periodically (after commitlog_total_space_in_mb is exceeded?). With 1.1.5/6, all nanotime commitlogs are replayed on startup regardless of whether they've been flushed. So in our case manually removing all the commitlogs after a drain was the only way to prevent their replay. Mike On Tue, Nov 20, 2012 at 5:19 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: @Mike I am glad to see I am not the only one with this issue (even if I am sorry it happened to you of course.). Isn't drain supposed to clear the commit logs ? Did removing them worked properly ? I his warning to C* users, Jonathan Ellis told that a drain would avoid this issue, It seems like it doesn't. @Rob You understood precisely the 2 issues I met during the upgrade. I am sad to see none of them is yet resolved and probably wont. 2012/11/20 Mike Heffner m...@librato.com Alain, We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs replayed regardless of the drain. After noticing this on the first node, we did the following: * nodetool flush * nodetool drain * service cassandra stop * mv /path/to/logs/*.log /backup/ * apt-get install cassandra restarts automatically I also agree that starting C* after an upgrade/install seems quite broken if it was already stopped before the install. However annoying, I have found this to be the default for most Ubuntu daemon packages. Mike On Thu, Nov 15, 2012 at 9:21 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: We had an issue with counters over-counting even using the nodetool drain command before upgrading... Here is my bash history 69 cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 70 cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 71 sudo apt-get install cassandra 72 nodetool disablethrift 73 nodetool drain 74 service cassandra stop 75 cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 76 vim /etc/cassandra/cassandra-env.sh 77 cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 78 vim /etc/cassandra/cassandra.yaml 79 service cassandra start So I think I followed these steps http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps I merged my conf files with an external tool so consider I merged my conf files on steps 76 and 78. I saw that the sudo apt-get install cassandra stop the server and restart it automatically. So it updated without draining and restart before I had the time to reconfigure the conf files. Is this normal ? Is there a way to avoid it ? So for the second node I decided to try to stop C*before the upgrade. 125 cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 126 cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 127 nodetool disablegossip 128 nodetool disablethrift 129 nodetool drain 130 service cassandra stop 131 sudo apt-get install cassandra //131 : This restarted cassandra 132 nodetool disablethrift 133 nodetool disablegossip 134 nodetool drain 135 service cassandra stop 136 cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 137 cim /etc/cassandra/cassandra-env.sh 138 vim /etc/cassandra/cassandra-env.sh 139 cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 140 vim /etc/cassandra/cassandra.yaml 141 service cassandra start After both of these updates I saw my current counters increase without any reason. Did I do anything wrong ? Alain -- Mike Heffner m...@librato.com Librato, Inc. -- Mike Heffner m...@librato.com Librato, Inc.
Re: Datastax Java Driver
Great! 2012/11/20 michael.figui...@gmail.com michael.figui...@gmail.com The Apache Cassandra project has traditionally not focused on client side. Rather than modifying the scope of the project and jeopardizing the current driver ecosystem we've preferred to open source it this way. Not that this driver's license is Apache License 2 and it will remain so, making it easy to integrate it in most open source projects with no license issues. On Tue, Nov 20, 2012 at 1:19 AM, Timmy Turner timm.t...@gmail.com wrote: Why is this being released as a separate project, instead of being bundled up with Cassandra? Is it not a part of Cassandra? 2012/11/19 John Sanda john.sa...@gmail.com Fantastic! As for the object mapping API, has there been any discussion/consideration of http://www.hibernate.org/subprojects/ogm.html? On Mon, Nov 19, 2012 at 1:50 PM, Sylvain Lebresne sylv...@datastax.comwrote: Everyone, We've just open-sourced a new Java driver we have been working on here at DataStax. This driver is CQL3 only and is built to use the new binary protocol that will be introduced with Cassandra 1.2. It will thus only work with Cassandra 1.2 onwards. Currently, it means that testing it requires 1.2.0-beta2. This is also alpha software at this point. You are welcome to try and play with it and we would very much welcome feedback, but be sure that break, it will. The driver is accessible at: http://github.com/datastax/java-driver Today we're open-sourcing the core part of this driver. This main goal of this core module is to handle connections to the Cassandra cluster with all the features that one would expect. The currently supported features are: - Asynchronous: the driver uses the new CQL binary protocol asynchronous capabilities. - Nodes discovery. - Configurable load balancing/routing. - Transparent fail-over. - C* tracing handling. - Convenient schema access. - Configurable retry policy. This core module provides a simple low-level API (that works directly with query strings). We plan to release a higher-level, thin object mapping API based on top of this core shortly. Please refer to the project README for more information. -- The DataStax Team -- - John -- Jérémy
Re: Upgrade 1.1.2 - 1.1.6
On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote: We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs replayed regardless of the drain. Your experience and desire for different (expected) behavior is welcomed on : https://issues.apache.org/jira/browse/CASSANDRA-4446 nodetool drain sometimes doesn't mark commitlog fully flushed If every production operator who experiences this issue shares their experience on this bug, perhaps the project will acknowledge and address it. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Upgrade 1.1.2 - 1.1.6
On Tue, Nov 20, 2012 at 2:49 PM, Rob Coli rc...@palominodb.com wrote: On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote: We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs replayed regardless of the drain. Your experience and desire for different (expected) behavior is welcomed on : https://issues.apache.org/jira/browse/CASSANDRA-4446 nodetool drain sometimes doesn't mark commitlog fully flushed If every production operator who experiences this issue shares their experience on this bug, perhaps the project will acknowledge and address it. Well in this case I think our issue was that upgrading from nanotime-epoch seconds, by definition, replays all commit logs. That's not due to any specific problem with nodetool drain not marking commitlog's flushed, but a safety to ensure data is not lost due to buggy nanotime implementations. For us, it was that the upgrade instructions pre-1.1.5-1.1.6 didn't mention that CL's should be removed if successfully drained. On the other hand, we do not use counters so replaying them was merely a much longer MTT-Return after restarting with 1.1.6. Mike -- Mike Heffner m...@librato.com Librato, Inc.
Re: Query regarding SSTable timestamps and counts
My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. upgradetables re-writes every sstable to have the same contents in the newest format. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote: Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote: We have a cluster running cassandra 1.1.4. On this cluster, 1. We had to move the nodes around a bit when we were adding new nodes (there was quite a good amount of node movement ) 2. We had to stop compactions during some of the days to save some disk space on some of the nodes when they were running very very low on disk spaces. (via nodetool stop COMPACTION) As per datastax documentation,
Re: Query regarding SSTable timestamps and counts
On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. upgradetables re-writes every sstable to have the same contents in the newest format. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote: Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote: We have a cluster running cassandra 1.1.4. On this cluster, 1. We had to move the nodes around a bit when we were adding new nodes (there was quite a good amount of node movement ) 2. We had to stop compactions during some of the days to save some disk space on some of the nodes when they were running very very low on disk spaces. (via nodetool stop COMPACTION) As per datastax documentation, a manual
Re: Query regarding SSTable timestamps and counts
Thanks a lot Aaron and Edward. The mail thread clarifies some things for me. For letting others know on this thread, running an upgradesstables did decrease our bloom filter false positive ratios a lot. ( upgradesstables was run not to upgrade from a casasndra version to a higher cassandra version but because of all the node movement we had done to upgrade our cluster in a staggered way with aborted attempts in between and I understand that upgradesstables was not necessarily required for the high bloom filter false positives rates we were seeing ) Regards, Ananth On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. upgradetables re-writes every sstable to have the same contents in the newest format. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote: Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much
Re: row cache re-fill very slow
INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 451) completed loading (5175655 ms; 13259976 keys) row cache So it was reading 2,562 rows per second during startup. I'd say that's not unreasonable performance for 13 million rows. It will get faster in 1.2, but for now just have the cache save less keys perhaps. Would something like iterating over SSTables instead, and throwing rows at the cache that need to be in there feasible ? During start up we do not read the -Data.db component of the SStable, only the -Index.db (and -Filter.db) component. Also the SSTables are opened in parallel. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/11/2012, at 10:39 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: Aaron, What version are you on ? 1.1.5 Do you know how many rows were loaded ? INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 451) completed loading (5175655 ms; 13259976 keys) row cache In both cases I do not believe the cache is stored in token (or key) order. Am i getting this right: the row keys are read and rows are retrieved from SSTables in the order their keys are in the cache file.. Would something like iterating over SSTables instead, and throwing rows at the cache that need to be in there feasible ? If the SSTables themselves are written sequentially at compaction time , which is how i remember they are written, SSTable-sized sequential reads with a filter ( bloom filter for the row cache? :-) ) must be faster than reading from all across the column family ( i have HDDs and about 1k SSTables ) row_cache_keys_to_save in yaml may help you find a happy half way point. If i can keep that high enough, with my data retention requirements, save for the absolute first get on a row, i can operate entirely out of memory. thanks! Andras Andras Szerdahelyi Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A M: +32 493 05 50 88 | Skype: sandrew84 C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png On 19 Nov 2012, at 22:00, aaron morton aa...@thelastpickle.com wrote: i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) re-fill of the row cache at start up. It was mentioned the other day. What version are you on ? Do you know how many rows were loaded ? When complete it will log a message with the pattern completed loading (%d ms; %d keys) row cache for %s.%s How is the saved row cache file processed? In Version 1.1, after the SSTables have been opened the keys in the saved row cache are read one at a time and the whole row read into memory. This is a single threaded operation. In 1.2 reading the saved cache is still single threaded, but reading the rows goes through the read thread pool so is in parallel. In both cases I do not believe the cache is stored in token (or key) order. ( Admittedly whatever is going on is still much more preferable to starting with a cold row cache ) row_cache_keys_to_save in yaml may help you find a happy half way point. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/11/2012, at 3:17 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: Hey list, i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) re-fill of the row cache at start up. We operate with a large row cache ( 10-15GB currently ) and we already measure startup times in hours :-) How is the saved row cache file processed? Are the cached row keys simply iterated over and their respective rows read from SSTables - possibly creating random reads with small enough sstable files, if the keys were not stored in a manner optimised for a quick re-fill ? - or is there a smarter algorithm ( i.e. scan through one sstable at a time, filter rows that should be in row cache ) at work and this operation is purely disk i/o bound ? ( Admittedly whatever is going on is still much more preferable to starting with a cold row cache ) thanks! Andras Andras Szerdahelyi Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A M: +32 493 05 50 88 | Skype: sandrew84 C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png
Re: Invalid argument
Thanks for the work around, setting disk_access_mode: standard worked. hmmm, it's only a work around. If you can reproduce the fault could you report it on https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/11/2012, at 11:03 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi Aaron. Here is my java -version java version 1.6.0_35 Java(TM) SE Runtime Environment (build 1.6.0_35-b10) Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) Thanks for the work around, setting disk_access_mode: standard worked. Alain 2012/11/19 aaron morton aa...@thelastpickle.com Are you running a 32 bit JVM ? What is the full JVM version ? As a work around you can try disabling memory mapped access set disk_access_mode to standard. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/11/2012, at 6:27 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I have backed up production sstables from one of my 3 production nodes (RF=3) and I want to use them on my dev environment.(C* 1.1.6 on both environments) My dev server is a 4 core, 4 GB RAM hardware runing on ubuntu. I have applied the production schema in my dev node and copied all sstable in the appropriated folder and restart my node like I always do. But this time have had the following error (many times and only for ) : INFO [SSTableBatchOpen:4] 2012-11-19 17:52:52,980 SSTableReader.java (line 169) Opening /var/lib/cassandra/data/cassa_teads/data_action/cassa_teads-data_action-hf-660 (7015417424 bytes) ERROR [SSTableBatchOpen:3] 2012-11-19 17:53:17,259 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[SSTableBatchOpen:3,5,main] java.io.IOError: java.io.IOException: Invalid argument at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:242) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Invalid argument at sun.nio.ch.FileChannelImpl.truncate0(Native Method) at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:194) ... 11 more If I try with nodetool refresh I have the following error : Exception in thread main java.io.IOError: java.io.IOException: Invalid argument at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:202) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:179) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:429) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:510) at org.apache.cassandra.db.ColumnFamilyStore.loadNewSSTables(ColumnFamilyStore.java:468) at org.apache.cassandra.service.StorageService.loadNewSSTables(StorageService.java:3089) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source) at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
Re: Query regarding SSTable timestamps and counts
upgradetables re-writes every sstable to have the same contents in the newest format. Agree. In the world of compaction, and excluding upgrades, have older sstables is expected. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/11/2012, at 11:45 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. upgradetables re-writes every sstable to have the same contents in the newest format. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote: Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com
Re: Invalid argument
On Tue, Nov 20, 2012 at 2:03 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: ] Thanks for the work around, setting disk_access_mode: standard worked. Do you have working JNA, for reference? =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Upgrade 1.1.2 - 1.1.6
Hi! I had the same problem (over counting due to replay of commit log, which ignored drain) after upgrading my cluster from 1.0.9 to 1.0.11. I updated the Cassandra tickets mentioned in this thread. Regards, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Nov 20, 2012 at 11:03 PM, Mike Heffner m...@librato.com wrote: On Tue, Nov 20, 2012 at 2:49 PM, Rob Coli rc...@palominodb.com wrote: On Mon, Nov 19, 2012 at 7:18 PM, Mike Heffner m...@librato.com wrote: We performed a 1.1.3 - 1.1.6 upgrade and found that all the logs replayed regardless of the drain. Your experience and desire for different (expected) behavior is welcomed on : https://issues.apache.org/jira/browse/CASSANDRA-4446 nodetool drain sometimes doesn't mark commitlog fully flushed If every production operator who experiences this issue shares their experience on this bug, perhaps the project will acknowledge and address it. Well in this case I think our issue was that upgrading from nanotime-epoch seconds, by definition, replays all commit logs. That's not due to any specific problem with nodetool drain not marking commitlog's flushed, but a safety to ensure data is not lost due to buggy nanotime implementations. For us, it was that the upgrade instructions pre-1.1.5-1.1.6 didn't mention that CL's should be removed if successfully drained. On the other hand, we do not use counters so replaying them was merely a much longer MTT-Return after restarting with 1.1.6. Mike -- Mike Heffner m...@librato.com Librato, Inc. tokLogo.png
Re: Looking for a good Ruby client
Thanks Mat! I thought you were going to expose the internals of CQL3 features like (wide rows with) complex keys and collections to CQL2 clients (which is something that should generally be possible, if Datastax' blog posts are accurate, i.e. an actual description of how things were implemented and not just a conceptual one). I'm still negotiating with my project lead on what features will ultimately be implemented, so I'm not sure whether CQL2/3 interoperability will actually make it into the final 'product' .. but it isn't very high up on the priority list, so it will most likely be implemented towards the end, and thus I guess it'll also kind of depend on how much CQL2 support will be provided by Cassandra itself when the time comes. 2012/11/20 Mat Brown m...@brewster.com Hi Timmy, I haven't done a lot of playing with CQL3 yet, mostly just reading the blog posts, so the following is subject to change : ) Right now, the Cequel model layer has a skinny row model (which is designed to follow common patterns of Ruby ORMs) and a wide row model (which is designed to behave more or less like a Hash, the equivalent of Java's HashMap). The two don't integrate with each other in any meaningful way, but as far as I understand it, they do pretty much cover the data modeling possibilities in CQL2. The big idea I've got for the overhaul of Cequel for CQL3 is to allow building a rich, nested data model by integrating different flavors of CQL3 table, most notably multi-column primary keys, as well as collections. The core data types I have in mind are: 1) Skinny row with simple primary key (e.g. blogs, with blog_id key) 2) Skinny row with complex primary key (e.g. blog_posts, with (blog_id, post_id) key) 3) Wide row with simple primary key (e.g. blog_languages -- kind of a weak example but i can't think of anything better for a blog : ) 4) Wide row with complex primary key (e.g. blog_post_tags) My goal is to make it easy to model one-one relationships via a shared primary key, and one-many via a shared prefix of the primary key. So, for instance, blogs and blog_languages rows would be one-one (both with a blog_id primary key) and blogs and blog_posts would be one-many (sharing the blog_id prefix in the primary key). From what I've read, it seems fairly clear that the actual CQL used to interact with #1 will be the same for CQL2 column families and CQL3 tables, so no explicit backward compatibility would be needed. #2 and #4 are, of course, CQL3-only, so backward compatibility isn't an issue there either. What I'm not entirely clear on is #3 -- this is straightforward in CQL2, and presumably a CQL3 table with compact storage would behave in the same way. However, my understanding so far is that a non-compact CQL3 table would treat this structure differently, in that both the key and value of the map would correspond to columns in a CQL3 table. It may make more sense to just target compact storage tables with this data structure, but I'm going to need to play around with it more to figure that out. Otherwise, Cequel will need to provide two flavors of that structure. There's also some tension between CQL3 collections and just using traditional wide-row structures to achieve the same thing. For instance, blog_tags could also just be a tags collection in the blogs table. My plan at this point is to offer both options, since each has its advantages (collections don't require the creation of a separate table; but a separate table gives you access to slices of the collection). Anyway, that's probably a lot more of an answer than you needed, but hopefully the context helps. Definitely interested to hear about the direction you take your client in as well. Finally, regarding a blog, we've got one set up, but it's not live yet. I'll ping you with a link when it is; I'll certainly be posting on the development of the next Cequel release. Cheers, Mat On Tue, Nov 20, 2012 at 9:23 AM, Timmy Turner timm.t...@gmail.com wrote: @Mat Brown: (while still retaining compatibility with CQL2 structures). Do you mean by exceeding what Cassandra itself provides in terms of CQL2/3 interoperability? I'm looking into something similar currently (however in Java not in Ruby) and would be interested in your experiences, if you follow through with the plan. Do you have a blog? Thanks! 2012/11/20 Alain RODRIGUEZ arodr...@gmail.com @Mat Well I guess you could add your Ruby client to this list since there is not a lot of them yet. http://wiki.apache.org/cassandra/ClientOptions Alain 2012/11/20 Mat Brown m...@brewster.com As the author of Cequel, I can assure you it is excellent ; ) We use it in production at Brewster and it is quite stable. If you try it out and find any bugs, we'll fix 'em quickly. I'm planning a big overhaul of the model layer over the holidays to expose all the new data modeling goodness