Re: Rolling upgrade from 1.1.12 to 1.2.5 visibility issue
Hi aaron, Thank you for your reply. We tried to increase PHI threshold but still met same issue. We used Ec2Snitch and PropertyFileSnitch instead and they work without this problem. It seems only happened with Ec2MultiRegionSnitch config. Although we can workaround this problem by PropertyFileSnitch, we hit another bug: EOFException in https://issues.apache.org/jira/browse/CASSANDRA-5476. We will try to upgrade to 1.1.12 first and waiting for the fix of issue 5476. Thank you! On Thu, Jun 20, 2013 at 5:49 PM, aaron morton aa...@thelastpickle.comwrote: I once had something like this, looking at your logs I donot think it's the same thing but here is a post on it http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/ It's a little different in 1.2 but the GossipDigestAckVerbHandler (and ACK2) should be calling Gossiper.instance.notifyFailureDetector which will result in the FailureDetector being called. This will keep the remote node marked as up. it looks like this is happening. TRACE [GossipTasks:1] 2013-06-19 07:44:52,359 FailureDetector.java (line 189) PHI for /54.254.xxx.xxx : 8.05616263930532 The default phi_convict_threshold is 8, so this node thinks the other is just sick enough to be marked as down. As a work around try increasing the phi_convict_threshold to 12. Not sure why the 1.2 node thinks this, not sure if anything has changed. I used to think there was a way to dump the phi values for nodes, but I cannot find it. If you call dumpInterArrivalTimes on the org.apache.cassandra.net:type=FailureDetector MBean it will dump a file in the temp dir called failuredetector-* with the arrival times for messages from the other nodes. That may help. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 19/06/2013, at 8:34 PM, Polytron Feng liqpolyt...@gmail.com wrote: Hi, We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the 1.2.5 node cannot see other old nodes. Therefore, we tried to upgrade to 1.1.12 first, and it works. However, we still saw the same issue when rolling upgrade from 1.1.12 to 1.2.5. This seems to be the fixed issue as https://issues.apache.org/jira/browse/CASSANDRA-5332 but we still saw it in 1.2.5. Enviroment: OS: CentOS 6 JDK: 6u31 cluster:3 nodes for testing, in EC2 Snitch: Ec2MultiRegionSnitch NetworkTopologyStrategy: strategy_options = { ap-southeast:3 } We have 3 nodes and we upgraded 122.248.xxx.xxx to 1.2.5 first, the other 2 nodes are still 1.1.12. When we restarted the upgraded node, it will see the other 2 old nodes as UP in the log. However, after a few seconds, these 2 nodes will be marked as DOWN. This is the ring info from 1.2.5 node - 122.248.xxx.xxx Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: ap-southeast == Address RackStatus State LoadOwns Token 113427455640312821154458202477256070486 122.248.xxx.xxx 1b Up Normal 69.74 GB33.33% 1 54.251.xxx.xxx 1b Down Normal 69.77 GB33.33% 56713727820156410577229101238628035243 54.254.xxx.xxx 1b Down Normal 70.28 GB33.33% 113427455640312821154458202477256070486 but Old 1.1.12 nodes can see new node: Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070486 122.248.xxx.xxx ap-southeast1b Up Normal 69.74 GB 33.33% 1 54.251.xxx.xxx ap-southeast1b Up Normal 69.77 GB 33.33% 56713727820156410577229101238628035243 54.254.xxx.xxx ap-southeast1b Up Normal 70.28 GB 33.33% 113427455640312821154458202477256070486 We enabled trace log level to check gossip related logs. The log below from 1.2.5 node shows that the other 2 nodes are UP in the beginning. They seem to complete SYN/ACK/ACK2 handshake cycle. TRACE [GossipStage:1] 2013-06-19 07:44:43,047 GossipDigestSynVerbHandler.java (line 40) Received a GossipDigestSynMessage from /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,047 GossipDigestSynVerbHandler.java (line 71) Gossip syn digests are : /54.254.xxx.xxx:1371617084:10967 /54.251.xxx.xxx:1371625851:2055 TRACE [GossipStage:1] 2013-06-19 07:44:43,048 Gossiper.java (line 945) requestAll for /54.254.xxx.xxx . TRACE [GossipStage:1] 2013-06-19 07:44:43,080 GossipDigestSynVerbHandler.java (line 84) Sending a GossipDigestAckMessage to /54.254.xxx.xxx TRACE [GossipStage:1] 2013-06-19 07:44:43,080 MessagingService.java (line 601) /122.248.216.142
Re: Heap is not released and streaming hangs at 0%
nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org Sent: Tuesday, June 18, 2013 10:43:13 AM Subject: Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Re: Joining distinct clusters with the same schema together
Question 2: is this a sane strategy? On its face my answer is not... really? I'd go with a solid no. Just because the the three independent clusters have a schema that looks the same does not make them the same. The schema is a versioned document, you will not be able to merge them by merging the DC's later without downtime. It will be easier to go with a multi DC setup from the start. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Eric Stevens migh...@gmail.com wrote: On its face my answer is not... really? What do you view yourself as getting with this technique versus using built in replication? As an example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM consistency level operations? Doing replication manually sounds like a recipe for the DC's eventually getting subtly out of sync with each other. If a connection goes down between DC's, and you are taking data at both, how will you catch each other up? C* already offers that resolution for you, and you'd have to work pretty hard to reproduce it for no obvious benefit that I can see. For minimum effort, definitely rely on Cassandra's well-tested codebase for this. On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala fsareshw...@quantcast.com wrote: Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same. Question 1: is this enough to guarentee that the three datacenters will have distinct cassandra clusters as well? Or will one node in datacenter A still somehow be able to join datacenter B's ring. If they have network connectivity and the same cluster name, they are the same logical cluster. However if your nodes share tokens and you have auto_bootstrap=yes (the implicit default) the second node you attempt to start will refuse to start because you are trying to bootstrap it into the range of a live node. For now, we are planning on using our own relay mechanism to transfer data changes from one datacenter to another. Are you planning to use the streaming commitlog functionality for this? Not sure how you would capture all changes otherwise, except having your app just write the same thing to multiple places? Unless data timestamps are identical between clusters, otherwise identical data will not merge properly, as cassandra uses data timestamps to merge. Question 2: is this a sane strategy? On its face my answer is not... really? What do you view yourself as getting with this technique versus using built in replication? As an example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM consistency level operations? Question 3: eventually, we want to turn all these cassandra clusters into one large multi-datacenter cluster. What's the best practice to do this? Should I just add nodes from all datacenters to the list of seeds and let cassandra resolve differences? Is there another way I don't know about? If you are using NetworkTopologyStrategy and have the same cluster name for your isolated clusters, all you need to do is : 1) configure NTS to store replicas on a per-datacenter basis 2) ensure that your nodes are in different logical data centers (by default, all nodes are in DC1/rack1) 3) ensure that clusters are able to reach each other 4) ensure that tokens do not overlap between clusters (the common technique with manual token assignment is that each node gets a range which is off-by-one) 5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC 6) rolling restart (so the new seed list is picked up) 7) repair (should only be required if writes have not replicated via your out of band mechanism) Vnodes change the picture slightly because the chance of your clusters having conflicting tokens increases with the number of token ranges you have. =Rob
Re: error on startup: unable to find sufficient sources for streaming range
On some of my nodes, I'm getting the following exception when cassandra starts How many nodes? Is this a new node or an old one and this problem just started ? What version are you on ? Do you have this error from system.log ? It includes the thread name which is handy to debug things. Also looks like there are some lines missing from the first error. it looks like an error that may happen when a node is bootstrapping or replacing an existing node. If you can provide some more context we may be able to help. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 10:36 AM, Faraaz Sareshwala fsareshw...@quantcast.com wrote: Hi, I couldn't find any information on the following error so I apologize if it has already been discussed. On some of my nodes, I'm getting the following exception when cassandra starts up: 2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to find sufficient sources for streaming range (-4250921392403750427,-4250887922781325324] 2013-06-19 22:17:39.482733500 ERROR Exception in thread Thread[StorageServiceShutdownHook,5,main] (CassandraDaemon.java:org.apache.cassandra.service.CassandraDaemon$1:175) 2013-06-19 22:17:39.482735500 java.lang.NullPointerException 2013-06-19 22:17:39.482735500 at org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321) 2013-06-19 22:17:39.482736500 at org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:362) 2013-06-19 22:17:39.482736500 at org.apache.cassandra.service.StorageService.access$000(StorageService.java:88) 2013-06-19 22:17:39.482751500 at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:513) 2013-06-19 22:17:39.482752500 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 2013-06-19 22:17:39.482752500 at java.lang.Thread.run(Thread.java:662) Can someone point me to more information about what could cause this error? Faraaz
Re: Performance Difference between Cassandra version
I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? 1.0 has key and row caches defined per CF, 1.1 has global ones which are better utilised and easier to manage. 1.2 moves bloom filters and compression meta off heap which reduces GC, which will help. Things normally get faster. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 11:24 AM, Franc Carter franc.car...@sirca.org.au wrote: On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal jamalrai...@gmail.com wrote: I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? Has anyone seen any major performance difference? We are part way through a performance comparison between 1.0.9 with Size Tiered Compaction and 1.2.4 with Leveled Compaction - for our use case it looks like a significant performance improvement on the read side. We are finding compaction lags when we do very large bulk loads, but for us this is an initialisation task and that's a reasonable trade-off cheers -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Unit Testing Cassandra
2) Second (in which I am more interested in) is for performance (stress/load) testing. Sometimes you can get cassandra-stress (shipped in the bin distro) to approximate the expected work load. It's then pretty easy to benchmark and tests you configuration changes. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 2:25 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and these sounds promising for unit testing, at least. Regards, Shahab On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo edlinuxg...@gmail.com wrote: You really do not need much in java you can use the embedded server. Hector wrap a simple class around thiscalled EmbeddedServerHelper On Wednesday, June 19, 2013, Ben Boule ben_bo...@rapid7.com wrote: Hi Shabab, Cassandra-Unit has been helpful for us for running unit tests without requiring a real cassandra instance to be running. We only use this to test our DAO code which interacts with the Cassandra client. It basically starts up an embedded instance of cassandra and fools your client/driver into using it. It uses a non-standard port and you just need to make sure you can set the port as a parameter into your client code. https://github.com/jsevellec/cassandra-unit One important thing is to either clear out the keyspace in between tests or carefully separate your data so different tests don't collide with each other in the embedded database. Setup/tear down time is pretty reasonable. Ben From: Shahab Yunus [shahab.yu...@gmail.com] Sent: Wednesday, June 19, 2013 8:46 AM To: user@cassandra.apache.org Subject: Re: Unit Testing Cassandra Thanks Stephen for you reply and explanation. My bad that I mixed those up and wasn't clear enough. Yes, I have different 2 requests/questions. 1) One is for the unit testing. 2) Second (in which I am more interested in) is for performance (stress/load) testing. Let us keep integration aside for now. I do see some stuff out there but wanted to know recommendations from the community given their experience. Regards, Shahab On Wed, Jun 19, 2013 at 3:15 AM, Stephen Connolly stephen.alan.conno...@gmail.com wrote: Unit testing means testing in isolation the smallest part. Unit tests should not take more than a few milliseconds to set up and verify their assertions. As such, if your code is not factored well for testing, you would typically use mocking (either by hand, or with mocking libraries) to mock out the bits not under test. Extensive use of mocks is usually a smell of code that is not well designed *for testing* If you intend to test components integrated together... That is integration testing. If you intend to test performance of the whole or significant parts of the whole... That is performance testing. When searching for the above, you will not get much luck if you are looking for them in the context of unit testing as those things are *outside the scope of unit testing On Wednesday, 19 June 2013, Shahab Yunus wrote: Hello, Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out there for unit testing Cassandra stores? I am looking for testing from performance/load and monitoring perspective. I am using 1.2. Thanks a lot. Regards, Shahab -- Sent from my phone This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at (postmas...@rapid7.com) immediately.
Re: Get fragments of big files (videos)
You should split the large blobs into multiple rows, and I would use 10MB per row as a good rule of thumb. See http://www.datastax.com/dev/blog/cassandra-file-system-design for a description of blob store in cassandra Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 8:54 PM, Simon Majou si...@majou.org wrote: Thanks Serge Simon On Thu, Jun 20, 2013 at 10:48 AM, Serge Fonville serge.fonvi...@gmail.com wrote: Also, after a quick Google. http://wiki.apache.org/cassandra/CassandraLimitations states values cannot exceed 2GB, it also answers you offset question HTH Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl Convince Microsoft! They need to add TRUNCATE PARTITION in SQL Server https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table 2013/6/20 Sachin Sinha sinha.sac...@gmail.com Fragment them in rows, that will help. On 20 June 2013 09:43, Simon Majou si...@majou.org wrote: Hello, If I store a video into a column, how can I get a fragment of it without having to download it entirely ? Is there a way to give an offset on a column ? Do I have to fragment it over a lot of small fixed sizes columns ? Is there any disadvantage to do so ? For example fragment a 10GB file into 1 000 columns of 10 MB ? Simon
Re: Compaction not running
Do you think it's worth posting an issue, or not enough traceable evidence ? If you can reproduce it then certainly file a bug. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 9:41 PM, Franc Carter franc.car...@sirca.org.au wrote: On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.com wrote: nodetool compactionstats, gives pending tasks: 13120 If there are no errors in the log, I would say this is a bug. This happened after the node ran out of file descriptors, so an edge case wouldn't surprise me. I've rebuilt the node (blown the data way and am running a nodetool rebuild). Do you think it's worth posting an issue, or not enough traceable evidence ? cheers Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au wrote: On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.com wrote: Manual compaction for LCS doesn't really do much. It certainly doesn't compact all those little files into bigger files. What makes you think that compactions are not occurring? Yeah, that's what I thought, however:- nodetool compactionstats, gives pending tasks: 13120 Active compaction remaining time :n/a when I run nodetool compact in a loop the pending tasks goes down gradually. This node also has vastly higher latencies (x10) than the other nodes. I saw this with a previous CF than I 'manually compacted', and when the pending tasks reached low numbers (stuck on 9) then latencies were back to low milliseconds cheers -Bryan On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.au wrote: On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.au wrote: On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au wrote: We are running a test system with Leveled compaction on Cassandra-1.2.4. While doing an initial load of the data one of the nodes ran out of file descriptors and since then it hasn't been automatically compacting. You have (at least) two options : 1) increase file descriptors available to Cassandra with ulimit, if possible 2) increase the size of your sstables with levelled compaction, such that you have fewer of them Oops, I wasn't clear enough. I have increased the number of file descriptors and no longer have a file descriptor issue. However the node still doesn't compact automatically. If I run a 'nodetool compact' it will do a small amount of compaction and then stop. The Column Family is using LCS Any ideas on this - compaction is still not automatically running for one of my nodes thanks cheers =Rob -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Confirm with cqlsh of Cassandra-1.2.5, the behavior of the export/import
That looks like it may be a bug, can you raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/06/2013, at 1:56 AM, hiroshi.kise...@hitachi.com wrote: Dear everyone. I'm Hiroshi Kise. I will confirm with cqlsh of Cassandra-1.2.5, the behavior of the export / import of data. Using the Copy of cqlsh, the data included the “{“ and “[“ (= CollectionType) case, I think in the export / import process, data integrity is compromised. How about? Such as the definition of create table, if there is an error in courtesy, please tell me the right way. Concrete operation is as follows. -*-*-*-*-*-*-*-* (1)map type's export/import export [root@castor bin]# ./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh create keyspace maptestks with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; cqlsh use maptestks; cqlsh:maptestks create table maptestcf (rowkey varchar PRIMARY KEY, targetmap mapvarchar,varchar); cqlsh:maptestks insert into maptestcf (rowkey, targetmap) values ('rowkey',{'mapkey':'mapvalue'}); cqlsh:maptestks select * from maptestcf; rowkey | targetmap + rowkey | {mapkey: mapvalue} cqlsh:maptestks copy maptestcf to 'maptestcf-20130619.txt'; 1 rows exported in 0.008 seconds. cqlsh:maptestks exit; [root@castor bin]# cat maptestcf-20130619.txt rowkey,{mapkey: mapvalue} (a) import [root@castor bin]# ./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh create keyspace mapimptestks with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; cqlsh use mapimptestks; cqlsh:mapimptestks create table mapimptestcf (rowkey varchar PRIMARY KEY, targetmap mapvarchar,varchar); cqlsh:mapimptestks copy mapimptestcf from ' maptestcf-20130619.txt '; Bad Request: line 1:83 no viable alternative at input '}' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.025 seconds. -*-*-*-*-*-*-*-* (2)list type's export/import export [root@castor bin]#./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh create keyspace listtestks with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; cqlsh use listtestks; cqlsh:listtestks create table listtestcf (rowkey varchar PRIMARY KEY, value listvarchar); cqlsh:listtestks insert into listtestcf (rowkey,value) values ('rowkey',['value1','value2']); cqlsh:listtestks select * from listtestcf; rowkey | value +-- rowkey | [value1, value2] cqlsh:listtestks copy listtestcf to 'listtestcf-20130619.txt'; 1 rows exported in 0.014 seconds. cqlsh:listtestks exit; [root@castor bin]# cat listtestcf-20130619.txt rowkey,[value1, value2] (b) export [root@castor bin]# ./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh create keyspace listimptestks with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' }; cqlsh use listimptestks; cqlsh:listimptestks create table listimptestcf (rowkey varchar PRIMARY KEY, value listvarchar); cqlsh:listimptestks copy listimptestcf from ' listtestcf-20130619.txt '; Bad Request: line 1:79 no viable alternative at input ']' Aborting import at record #0 (line 1). Previously-inserted values still present. 0 rows imported in 0.030 seconds. -*-*-*-*-*-*-*-* Reference: (correct, or error, in another dimension) Manually, I have rewritten the export file. [root@castor bin]# cat nlisttestcf-20130619.txt rowkey,['value1',' value2'] cqlsh:listimptestks copy listimptestcf from 'nlisttestcf-20130619.txt'; 1 rows imported in 0.035 seconds. cqlsh:listimptestks select * from implisttestcf; rowkey | value +-- rowkey | [value1, value2] cqlsh:implisttestks exit; [root@castor bin]# cat nmaptestcf-20130619.txt rowkey,”{'mapkey': 'mapvalue'}” [root@castor bin]# ./cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh use mapimptestks; cqlsh:mapimptestks copy mapimptestcf from 'nmaptestcf-20130619.txt'; 1 rows imported
Re: block size
If I have a data in column of size 500KB, Also some information here http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ The data files are memory mapped so it's sort of OS dependant. A - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/06/2013, at 8:29 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Ok. Though the closest that I can find is this (Aaron Morton's great blog): http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ I would also like to know the answer as, as such, I also haven't came across 'block size' as a core concept (or a concept to be considered while developing with) Cassandra unlike Hadoop. Regards, Shahab On Thu, Jun 20, 2013 at 3:38 PM, Kanwar Sangha kan...@mavenir.com wrote: Yes. Is that not specific to hadoop with CFS ? I want to know that If I have a data in column of size 500KB, how many IOPS are needed to read that ? (assuming we have key cache enabled) From: Shahab Yunus [mailto:shahab.yu...@gmail.com] Sent: 20 June 2013 14:32 To: user@cassandra.apache.org Subject: Re: block size Have you seen this? http://www.datastax.com/dev/blog/cassandra-file-system-design Regards, Shahab On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha kan...@mavenir.com wrote: Hi – What is the block size for Cassandra ? is it taken from the OS defaults ?
Re: Compaction not running
On Fri, Jun 21, 2013 at 6:16 PM, aaron morton aa...@thelastpickle.comwrote: Do you think it's worth posting an issue, or not enough traceable evidence ? If you can reproduce it then certainly file a bug. I'll keep my eye on it to see if it happens again and there is a pattern cheers Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 9:41 PM, Franc Carter franc.car...@sirca.org.au wrote: On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.comwrote: nodetool compactionstats, gives pending tasks: 13120 If there are no errors in the log, I would say this is a bug. This happened after the node ran out of file descriptors, so an edge case wouldn't surprise me. I've rebuilt the node (blown the data way and am running a nodetool rebuild). Do you think it's worth posting an issue, or not enough traceable evidence ? cheers Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au wrote: On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.comwrote: Manual compaction for LCS doesn't really do much. It certainly doesn't compact all those little files into bigger files. What makes you think that compactions are not occurring? Yeah, that's what I thought, however:- nodetool compactionstats, gives pending tasks: 13120 Active compaction remaining time :n/a when I run nodetool compact in a loop the pending tasks goes down gradually. This node also has vastly higher latencies (x10) than the other nodes. I saw this with a previous CF than I 'manually compacted', and when the pending tasks reached low numbers (stuck on 9) then latencies were back to low milliseconds cheers -Bryan On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.au wrote: On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.au wrote: On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote: On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au wrote: We are running a test system with Leveled compaction on Cassandra-1.2.4. While doing an initial load of the data one of the nodes ran out of file descriptors and since then it hasn't been automatically compacting. You have (at least) two options : 1) increase file descriptors available to Cassandra with ulimit, if possible 2) increase the size of your sstables with levelled compaction, such that you have fewer of them Oops, I wasn't clear enough. I have increased the number of file descriptors and no longer have a file descriptor issue. However the node still doesn't compact automatically. If I run a 'nodetool compact' it will do a small amount of compaction and then stop. The Column Family is using LCS Any ideas on this - compaction is still not automatically running for one of my nodes thanks cheers =Rob -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: nodetool ring showing different 'Load' size
Ok. Thank you all you guys. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Ceará Project Manager MBA, CSM, CSPO, SCJP On Wed, Jun 19, 2013 at 2:26 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski mich...@opera.com wrote: You can also perform a major compaction via nodetool compact (for SizeTieredCompaction), but - again - you really should not do it unless you're really sure what you do, as it compacts all the SSTables together, which is not something you might want to achieve in most of the cases. If you do that and discover you did not want to : https://github.com/pcmanus/cassandra/tree/sstable_split Will enable you to split your monolithic sstable back into smaller sstables. =Rob PS - @pcmanus, here's that reminder we discussed @ summit to merge this tool into upstream! :D
Re: [Cassandra] Replacing a cassandra node
Is there a way to replace a failed server using vnodes? I only had occasion to do this once, on a relatively small cluster. At the time I just needed to get the new server online and wasn't concerned about the performance implications, so I just removed the failed server from the cluster and bootstrapped a new one. Of course that caused a bunch of key reassignments, so I'm sure it would be less work for the cluster if I could bring a new server online with the same vnodes as the failed server. On Thu, Jun 20, 2013 at 2:40 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jun 20, 2013 at 10:40 AM, Emalayan Vairavanathan svemala...@yahoo.com wrote: In the case where replace a cassandra node (call it node A) with another one that has the exact same IP (ie. during a node failure), what exactly should we do? Currently I understand that we should at least run nodetool repair. If you lost the data from the node, then what you want is replace_token. If you didn't lose the data from the node (and can tolerate stale reads until the repair completes) you want to start the node with auto_bootstrap set to false and then repair. =Rob
Re: timeuuid and cql3 query
It's my understanding that if cardinality of the first part of the primary key has low cardinality, you will struggle with cluster balance as (unless you use WITH COMPACT STORAGE) the first entry of the primary key equates to the row key from the traditional interface, thus all entries related to a single value for the counter column will map to the same partition. So consider the cardinality of this field, if cardinality is low, you might need to remodel with PRIMARY KEY (counter, ts, key1) then tack on WITH COMPACT STORAGE (then the entire primary key becomes the row key, but you can only have one column which is not part of the primary key) If cardinality of counter is high, then you have nothing to worry about. On Wed, Jun 19, 2013 at 3:16 PM, Francisco Andrades Grassi bigjoc...@gmail.com wrote: Hi, I believe what he's recommending is: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (counter, ts) ) That way *counter* will be your partitioning key, and all the rows that have the same *counter* value will be clustered (stored as a single wide row sorted by the *ts* value). In this scenario the query: where counter = 'test' and ts minTimeuuid('2013-06-18 16:23:00') and ts minTimeuuid('2013-06-18 16:24:00'); would actually be a sequential read on a wide row on a single node. -- Francisco Andrades Grassi www.bigjocker.com @bigjocker On Jun 19, 2013, at 12:17 PM, Ryan, Brent br...@cvent.com wrote: Tyler, You're recommending this schema instead, correct? CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (ts, counter) ) I believe I tried this as well and ran into similar problems but I'll try it again. I'm using the ByteOrderedPartitioner if that helps with the latest version of DSE community edition which I believe is Cassandra 1.2.3. Thanks, Brent From: Tyler Hobbs ty...@datastax.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday, June 19, 2013 11:00 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: timeuuid and cql3 query On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent br...@cvent.com wrote: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY ((counter, ts)) ) Instead of doing a composite partition key, remove a set of parens and let ts be your clustering key. That will cause cql rows to be stored in sorted order by the ts column (for a given value of counter) and allow you to do the kind of query you're looking for. -- Tyler Hobbs DataStax http://datastax.com/
Re: Heap is not released and streaming hangs at 0%
On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of rows. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. It also appears to be a case of hard GC failure (as Rob mentioned) as the heap is never released, even after 24+ hours of idle time, the JVM needs to be restarted to reclaim the heap. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei -- *From: *Robert Coli rc...@eventbrite.com *To: *user@cassandra.apache.org *Sent: *Tuesday, June 18, 2013 10:43:13 AM *Subject: *Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Re: timeuuid and cql3 query
Yes. The problem is that I can't use counter as the partition key otherwise I'd wind up with hot spots in my cluster where majority of the data is being written to single node in the cluster. The only real way around this problem with Cassandra is to follow along with what this blog does: http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra From: Eric Stevens migh...@gmail.commailto:migh...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, June 21, 2013 8:38 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: timeuuid and cql3 query It's my understanding that if cardinality of the first part of the primary key has low cardinality, you will struggle with cluster balance as (unless you use WITH COMPACT STORAGE) the first entry of the primary key equates to the row key from the traditional interface, thus all entries related to a single value for the counter column will map to the same partition. So consider the cardinality of this field, if cardinality is low, you might need to remodel with PRIMARY KEY (counter, ts, key1) then tack on WITH COMPACT STORAGE (then the entire primary key becomes the row key, but you can only have one column which is not part of the primary key) If cardinality of counter is high, then you have nothing to worry about. On Wed, Jun 19, 2013 at 3:16 PM, Francisco Andrades Grassi bigjoc...@gmail.commailto:bigjoc...@gmail.com wrote: Hi, I believe what he's recommending is: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (counter, ts) ) That way counter will be your partitioning key, and all the rows that have the same counter value will be clustered (stored as a single wide row sorted by the ts value). In this scenario the query: where counter = 'test' and ts minTimeuuid('2013-06-18 16:23:00') and ts minTimeuuid('2013-06-18 16:24:00'); would actually be a sequential read on a wide row on a single node. -- Francisco Andrades Grassi www.bigjocker.comhttp://www.bigjocker.com/ @bigjocker On Jun 19, 2013, at 12:17 PM, Ryan, Brent br...@cvent.commailto:br...@cvent.com wrote: Tyler, You're recommending this schema instead, correct? CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (ts, counter) ) I believe I tried this as well and ran into similar problems but I'll try it again. I'm using the ByteOrderedPartitioner if that helps with the latest version of DSE community edition which I believe is Cassandra 1.2.3. Thanks, Brent From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, June 19, 2013 11:00 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: timeuuid and cql3 query On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent br...@cvent.commailto:br...@cvent.com wrote: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY ((counter, ts)) ) Instead of doing a composite partition key, remove a set of parens and let ts be your clustering key. That will cause cql rows to be stored in sorted order by the ts column (for a given value of counter) and allow you to do the kind of query you're looking for. -- Tyler Hobbs DataStaxhttp://datastax.com/
NREL has released open source Databus on github for time series data
NREL has released their open source databus. They spin it as energy data (and a system for campus energy/building energy) but it is very general right now and probably will stay pretty general. More information can be found here http://www.nrel.gov/analysis/databus/ The source code can be found here https://github.com/deanhiller/databus Star the project if you like the idea. NREL just did a big press release and is developing a community around the project. It is in it's early stages but there are users using it and I am helping HP set an instance up this month. If you want to become a committer on the project, let me know as well. Later, Dean
Cassandra terminates with OutOfMemory (OOM) error
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space at java.lang.Long.toString(Long.java:269) at java.lang.Long.toString(Long.java:764) at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171) at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068) at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) The data in each column is less than 50 bytes. After adding all the column overheads (column name + metadata), it should not be more than 100 bytes. So reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the data request is too big to fill in a reasonable amount of time, I would expect Cassandra to return a TimeOutException instead of terminating. One easy solution is to increase the heapsize. However that means Cassandra can still crash if someone reads 100 rows. I wonder if there some other Cassandra setting that I can tweak to prevent the OOM exception? Thanks, Mohammed
Re: Cassandra terminates with OutOfMemory (OOM) error
Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6 GC collectors. Bear in mind though that the OS will need memory, so will the row cache and the filing system. Although memory usage will depend on the workload of your system. I'm sure you'll also get good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: ** ** java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space at java.lang.Long.toString(Long.java:269) at java.lang.Long.toString(Long.java:764) at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171) at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068) at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** The data in each column is less than 50 bytes. After adding all the column overheads (column name + metadata), it should not be more than 100 bytes. So reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the data request is too big to fill
Cassandra driver performance question...
Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: Cassandra driver performance question...
Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: [Cassandra] Replacing a cassandra node with one of the same IP
Please note that I am currently using version 1.2.2 of Cassandra. Also we are using virtual nodes. My question mainly stems from the fact that the nodes appear to be aware that the node uuid changes for the IP (from reading the logs), so I am just wondering if this means the hinted handoffs are also updated to reflect the new Cassandra node uuid. If that was the case, I would not think a nodetool cleanup would be necessary. - Forwarded Message - From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org; Emalayan Vairavanathan svemala...@yahoo.commailto:svemala...@yahoo.com Sent: Thursday, 20 June 2013 11:40 AM Subject: Re: [Cassandra] Replacing a cassandra node On Thu, Jun 20, 2013 at 10:40 AM, Emalayan Vairavanathan svemala...@yahoo.commailto:svemala...@yahoo.com wrote: In the case where replace a cassandra node (call it node A) with another one that has the exact same IP (ie. during a node failure), what exactly should we do? Currently I understand that we should at least run nodetool repair. If you lost the data from the node, then what you want is replace_token. If you didn't lose the data from the node (and can tolerate stale reads until the repair completes) you want to start the node with auto_bootstrap set to false and then repair. =Rob
crashed while running repair
Hi, I am experimenting with Cassandra-1.2.4, and got a crash while running repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may have missed in the config ? Log is below ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line 174) Exception in thread Thread[Thread-136019,5,main] FSReadError in /var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192) ... 8 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745) ... 9 more ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375) Stopping gossiper thanks -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Heap is not released and streaming hangs at 0%
bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read performance with a value that high. I'd guess that anything larger than 0.1 might as well be 1.0. -Bryan On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote: On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of rows. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. It also appears to be a case of hard GC failure (as Rob mentioned) as the heap is never released, even after 24+ hours of idle time, the JVM needs to be restarted to reclaim the heap. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei -- *From: *Robert Coli rc...@eventbrite.com *To: *user@cassandra.apache.org *Sent: *Tuesday, June 18, 2013 10:43:13 AM *Subject: *Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Updated sstable size for LCS, ran upgradesstables, file sizes didn't change
We're potentially considering increasing the size of our sstables for some column families from 10MB to something larger. In test, we've been trying to verify that the sstable file sizes change and then doing a bit of benchmarking. However when we run alter the column family and then run nodetool upgradesstables -a keyspace columnfamily, the files in the data directory have been re-written, but the file sizes are the same. Is this the expected behavior? If not, what's the right way to upgrade them. If this is expected, how can we benchmark the read/write performance with varying sstable sizes. Thanks in advance! Andrew
Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change
On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: However when we run alter the column family and then run nodetool upgradesstables -a keyspace columnfamily, the files in the data directory have been re-written, but the file sizes are the same. Is this the expected behavior? If not, what's the right way to upgrade them. If this is expected, how can we benchmark the read/write performance with varying sstable sizes. It is expected, upgradesstables/scrub/clean compactions work on a single sstable at a time, they are not capable of combining or splitting them. In theory you could probably : 1) start out with the largest size you want to test 2) stop your node 3) use sstable_split [1] to split sstables 4) start node, test 5) repeat 2-4 I am not sure if there is anything about level compaction which makes this infeasible. =Rob [1] https://github.com/pcmanus/cassandra/tree/sstable_split
Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change
I think the new SSTable will be in the new size. In order to do that, you need to trigger a compaction so that the new SSTables will be generated. for LCS, there is no major compaction though. You can run a nodetool repair and hopefully you will bring some new SSTables and compactions will kick in. Or you can change the $CFName.json file under your data directory and move every SSTable to level 0. You need to stop your node, write a simple script to alter that file and start the node again. I think it will be helpful to have a nodetool command to change the SSTable Size and trigger the rebuild of the SSTables. Thanks. -Wei - Original Message - From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org Sent: Friday, June 21, 2013 4:51:29 PM Subject: Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: However when we run alter the column family and then run nodetool upgradesstables -a keyspace columnfamily, the files in the data directory have been re-written, but the file sizes are the same. Is this the expected behavior? If not, what's the right way to upgrade them. If this is expected, how can we benchmark the read/write performance with varying sstable sizes. It is expected, upgradesstables/scrub/clean compactions work on a single sstable at a time, they are not capable of combining or splitting them. In theory you could probably : 1) start out with the largest size you want to test 2) stop your node 3) use sstable_split [1] to split sstables 4) start node, test 5) repeat 2-4 I am not sure if there is anything about level compaction which makes this infeasible. =Rob [1] https://github.com/pcmanus/cassandra/tree/sstable_split
Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change
I think you can remove the json file which stores the mapping of which sstable is in which level. This will be treated by cassandra as all sstables in level 0 which will trigger a compaction. But if you have lot of data, it will be very slow as you will keep compacting data between L1 and L0. This also happens when you write very fast and have a pile up in L0. A comment from the code will explain this what I am saying // LevelDB gives each level a score of how much data it contains vs its ideal amount, and // compacts the level with the highest score. But this falls apart spectacularly once you // get behind. Consider this set of levels: // L0: 988 [ideal: 4] // L1: 117 [ideal: 10] // L2: 12 [ideal: 100] // // The problem is that L0 has a much higher score (almost 250) than L1 (11), so what we'll // do is compact a batch of MAX_COMPACTING_L0 sstables with all 117 L1 sstables, and put the // result (say, 120 sstables) in L1. Then we'll compact the next batch of MAX_COMPACTING_L0, // and so forth. So we spend most of our i/o rewriting the L1 data with each batch. // // If we could just do *all* L0 a single time with L1, that would be ideal. But we can't // -- see the javadoc for MAX_COMPACTING_L0. // // LevelDB's way around this is to simply block writes if L0 compaction falls behind. // We don't have that luxury. // // So instead, we // 1) force compacting higher levels first, which minimizes the i/o needed to compact //optimially which gives us a long term win, and // 2) if L0 falls behind, we will size-tiered compact it to reduce read overhead until //we can catch up on the higher levels. // // This isn't a magic wand -- if you are consistently writing too fast for LCS to keep // up, you're still screwed. But if instead you have intermittent bursts of activity, // it can help a lot. On Fri, Jun 21, 2013 at 5:42 PM, Wei Zhu wz1...@yahoo.com wrote: I think the new SSTable will be in the new size. In order to do that, you need to trigger a compaction so that the new SSTables will be generated. for LCS, there is no major compaction though. You can run a nodetool repair and hopefully you will bring some new SSTables and compactions will kick in. Or you can change the $CFName.json file under your data directory and move every SSTable to level 0. You need to stop your node, write a simple script to alter that file and start the node again. I think it will be helpful to have a nodetool command to change the SSTable Size and trigger the rebuild of the SSTables. Thanks. -Wei -- *From: *Robert Coli rc...@eventbrite.com *To: *user@cassandra.apache.org *Sent: *Friday, June 21, 2013 4:51:29 PM *Subject: *Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: However when we run alter the column family and then run nodetool upgradesstables -a keyspace columnfamily, the files in the data directory have been re-written, but the file sizes are the same. Is this the expected behavior? If not, what's the right way to upgrade them. If this is expected, how can we benchmark the read/write performance with varying sstable sizes. It is expected, upgradesstables/scrub/clean compactions work on a single sstable at a time, they are not capable of combining or splitting them. In theory you could probably : 1) start out with the largest size you want to test 2) stop your node 3) use sstable_split [1] to split sstables 4) start node, test 5) repeat 2-4 I am not sure if there is anything about level compaction which makes this infeasible. =Rob [1] https://github.com/pcmanus/cassandra/tree/sstable_split
Re: Heap is not released and streaming hangs at 0%
I will take a heap dump and see whats in there rather than guessing. On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.comwrote: bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read performance with a value that high. I'd guess that anything larger than 0.1 might as well be 1.0. -Bryan On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote: On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of rows. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. It also appears to be a case of hard GC failure (as Rob mentioned) as the heap is never released, even after 24+ hours of idle time, the JVM needs to be restarted to reclaim the heap. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei -- *From: *Robert Coli rc...@eventbrite.com *To: *user@cassandra.apache.org *Sent: *Tuesday, June 18, 2013 10:43:13 AM *Subject: *Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Re: crashed while running repair
Looks like memory map failed. In a 64 bit system, you should have unlimited virtual memory but Linux has a limit on the number of maps. Looks at these two places. http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed https://blog.kumina.nl/2011/04/cassandra-java-io-ioerror-java-io-ioexception-map-failed/ On Fri, Jun 21, 2013 at 3:22 PM, Franc Carter franc.car...@sirca.org.auwrote: Hi, I am experimenting with Cassandra-1.2.4, and got a crash while running repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may have missed in the config ? Log is below ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line 174) Exception in thread Thread[Thread-136019,5,main] FSReadError in /var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748) at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192) ... 8 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745) ... 9 more ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375) Stopping gossiper thanks -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Cassandra terminates with OutOfMemory (OOM) error
Looks like you are putting lot of pressure on the heap by doing a slice query on a large row. Do you have lot of deletes/tombstone on the rows? That might be causing a problem. Also why are you returning so many columns as once, you can use auto paginate feature in Astyanax. Also do you see lot of GC happening? On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam aja...@gmail.com wrote: Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6 GC collectors. Bear in mind though that the OS will need memory, so will the row cache and the filing system. Although memory usage will depend on the workload of your system. I'm sure you'll also get good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: ** ** java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space at java.lang.Long.toString(Long.java:269) at java.lang.Long.toString(Long.java:764) at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171) at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068) at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** The
Re: Cassandra driver performance question...
Thanks Jabbar, I ran nodetool as suggested and it 0 latency for the row count I have. I also ran cli list command for the table hit by my JDBC perparedStatement and it was slow like 121msecs the first time I ran it and second time I ran it it was 40msecs versus jdbc call of 38msecs to start with unless I run it twice also and get 1.5-2.5msecs for executeQuery the second time the preparedStatement is called. I ran describe from cli for the table and it said caching is ALL which is correct. A real mystery and I need to understand better what is going on. Regards, -Tony From: Jabbar Azam aja...@gmail.com To: user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com Sent: Friday, June 21, 2013 3:32 PM Subject: Re: Cassandra driver performance question... Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: Cassandra driver performance question...
Hi Jabbar, I think I know what is going on. I happened accross a change mentioned by the jdbc driver developers regarding metadata caching. Seems the metadata caching was moved from the connection object to the preparedStatement object. So I am wondering if the time difference I am seeing on the second preparedStatement object is because of the Metadata is cached then. So my question is how to test this theory? Is there a way to stop the metadata from coming accross from Cassandra? A 20x performance improvement would be nice to have. Thanks, -Tony From: Tony Anecito adanec...@yahoo.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Friday, June 21, 2013 8:56 PM Subject: Re: Cassandra driver performance question... Thanks Jabbar, I ran nodetool as suggested and it 0 latency for the row count I have. I also ran cli list command for the table hit by my JDBC perparedStatement and it was slow like 121msecs the first time I ran it and second time I ran it it was 40msecs versus jdbc call of 38msecs to start with unless I run it twice also and get 1.5-2.5msecs for executeQuery the second time the preparedStatement is called. I ran describe from cli for the table and it said caching is ALL which is correct. A real mystery and I need to understand better what is going on. Regards, -Tony From: Jabbar Azam aja...@gmail.com To: user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com Sent: Friday, June 21, 2013 3:32 PM Subject: Re: Cassandra driver performance question... Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!