Re: counters + replication = awful performance?
Counters replication works in different ways than the one of normal writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a degenerate case. So there is two reason why RF2 is much slower than RF=1: 1) it involves a read to replicate and that read takes times. Especially if that read hits the disk, it may even dominate the insertion time. 2) the replication to the first replica and the one to the res of the replica are not done in parallel but sequentially. Note that this is only true for the first replica versus the othere. In other words, from RF=2 to RF=3 you should see a significant performance degradation. Note that while there is nothing you can do for 2), you can try to speed up 1) by using row cache for instance (in case you weren't). In other words, with counters, it is expected that RF=1 be potentially much faster than RF1. That is the way counters works. And don't get me wrong, I'm not suggesting you should use RF=1 at all. What I am saying is that the performance you see with RF=2 is the performance of counters in Cassandra. -- Sylvain On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir solf.li...@gmail.com wrote: I think there might be a misunderstanding as to the nature of the problem. Say, I have test set T. And I have two identical servers A and B. - I tested that server A (singly) is able to handle load of T. - I tested that server B (singly) is able to handle load of T. - I then join A and B in the cluster and set replication=2 -- this means that each server in effect has to handle full test load individually (because there are two servers and replication=2 it means that each server effectively has to handle all the data written to the cluster). Under these circumstances it is reasonable to assume that cluster A+B shall be able to handle load T because each server is able to do so individually. HOWEVER, this is not the case. In fact, A+B together are only able to handle less than 1/3 of T DESPITE the fact that A and B individually are able to handle T just fine. I think there's something wrong with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down. To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of scala out don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower
Re: counters + replication = awful performance?
On Tue, Nov 27, 2012 at 3:21 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 Per Sylvain on the referenced ticket : I don't disagree about the efficiency of the valve, but at what price? 'Bootstrapping a node will make you lose increments (you don't know which ones, you don't know how many and this even if nothing goes wrong)' is a pretty bad drawback. That is pretty much why that option makes me uncomfortable: it does give you better performance, so people may be tempted to use it. Now if it was only a matter of replicating writes only through read-repair/repair, then ok, it's pretty dangerous but it's rather easy to explain/understand the drawback (if you don't lose a disk, you don't lose increments, and you'd better use CL.ALL or have read_repair_chance to 1). But the fact that it doesn't work with bootstrap/move makes me wonder if having the option at all is not making a disservice to users. To me anything that can be described as will make you lose increments (you don't know which ones, you don't know how many and this even if nothing goes wrong) and which therefore doesn't work with bootstrap/move is correctly described as dangerous. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: need some help with row cache
The row cache itself is global and the size is set with row_cache_size_in_mb. It must be enabled per CF using the proper settings. CQL3 isn't complete yet in C* 1.1 so if the cache settings aren't shown there, then you'll probably need to use cassandra-cli. -Bryan On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote: Use cassandracli. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: need some help with row cache From: Yiming Sun yiming@gmail.com To: user@cassandra.apache.org CC: Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com wrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: Other problem in update
The problens was that my unit tests are not cleaning up their data directory and there is some corrupt data in there. The problem was fixed by del the directory manualy. Thanks 2012/11/27 Tupshin Harper tups...@tupshin.com Unless I'm misreading the git history, the stack trace you referenced isn't from 1.1.2. In particular, the writeHintForMutation method in StorageProxy.java wasn't added to the codebase until September 9th ( https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=b38ca2879cf1cbf5de17e1912772b6588eaa7de6), and wasn't part of any release until 1.2.0-beta1. -Tupshin On Tue, Nov 27, 2012 at 7:40 AM, Everton Lima peitin.inu...@gmail.comwrote: writeHintForMutation -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás
Data backup and restore
Dear All, I have Cassandra 1.1.4 cluster with 2 nodes. I need to take backup and restore on staging for testing purpose. I have taken snapshot with below mentioned command but It created snapshot on every Keyspace's column family. Is there any other way to take backup and restore quick. /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost snapshot -t cassandra_bkup _*Snapshot directory:*_ /var/log/cassandra/data/KeySpace/subfolder/snapshot/cassandra_bkup -- Thanks Regards *Adeel**Akbar*
Re: Upgrade
Yes. java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) at org.apache.cassandra.thrift.ThriftSessionManager.currentSession(ThriftSessionManager.java:53) at org.apache.cassandra.thrift.CassandraServer.state(CassandraServer.java:88) at org.apache.cassandra.thrift.CassandraServer.system_add_keyspace(CassandraServer.java:1345) at harpia.ns.storage.cassandra.CassandraHelper.setupKeyspace(CassandraHelper.java:179) at harpia.ns.storage.cassandra.CassandraHelper.startInstance(CassandraHelper.java:154) at harpia.ns.storage.cassandra.CassandraStorageService.init(CassandraStorageService.java:129) at harpia.ns.storage.StorageServiceFactory.createInstance(StorageServiceFactory.java:39) at harpia.ns.storage.StorageServiceFactory.createInstanceFor(StorageServiceFactory.java:29) at harpia.ns.NodeServer.init(NodeServer.java:82) at harpia.ns.NodeServerFactory.createNodeServer(NodeServerFactory.java:8) at harpia.ns.StartNodeServer.run(StartNodeServer.java:56) - Someone knows why the set variable disappear? initialized = true in the class StorageService - method initServer(int dalay) in version 1.1.6, in this method it is set but in the version 1.2.0-beta2 it does not occour. So in my code I can not verify if the node is initialized. 2012/11/28 aaron morton aa...@thelastpickle.com Do you have the error stack ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/11/2012, at 12:28 AM, Everton Lima peitin.inu...@gmail.com wrote: Hello people. I was using cassandra 1.1.6 and use the Object CassandraServer() to create keyspaces by my code. But when I update to version 1.2.0-beta2, my code starts to throw Exception (NullPointerException) in the method: *in version 1.1.6* CassandraServer - state() - { SocketAddress remoteSocket = SocketSessionManagementService.remoteSocket.get(); if (remoteSocket == null) return clientState.get(); ClientState cState = SocketSessionManagementService.instance.get(remoteSocket); if (cState == null) { cState = new ClientState(); SocketSessionManagementService.instance.put(remoteSocket, cState); } return cState; } *in version 1.2.0* CassandraServer - state() - { return ThriftSessionManager.instance.currentSession(); } currtentSession(){ SocketAddress socket = remoteSocket.get(); assert socket != null; ThriftClientState cState = activeSocketSessions.get(socket); if (cState == null) { cState = new ThriftClientState(); activeSocketSessions.put(socket, cState); } return cState; } So, in version 1.1.6, it verify if has a remote connection, it not it try to get o local. In the version 1.2.0 it try to get a remote connection and apply it to a ThriftClientState, but if does not have a remote connection (like in 1.1.6) it will throw a NullPointerException in line: ThriftClientState cState = activeSocketSessions.get(socket); Is any way to use CassandraServer in the new version?? Thanks! -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás
Re: need some help with row cache
Thanks guys. However, after I ran the client code several times (same set of 5000 entries), still 2 of the 6 nodes show 0 hits on row cache, despite each node has 1GB capacity for row cache and the caches are full. Since I always request the same entries over and over again, shouldn't there be some hits? [user@node]$ ./checkinfo.sh Token: 85070591730234615865843651857942052863 Gossip active: true Thrift active: true Load : 587.15 GB Generation No: 1354074048 Uptime (seconds) : 36957 Heap Memory (MB) : 2027.29 / 3948.00 Data Center : DC1 Rack : r2 Exceptions : 0 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds Token: 141784319550391026443072753096570088105 Gossip active: true Thrift active: true Load : 583.21 GB Generation No: 1354074461 Uptime (seconds) : 36535 Heap Memory (MB) : 828.71 / 3948.00 Data Center : DC1 Rack : r2 Exceptions : 0 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds Row Cache: size 1072602906 (bytes), capacity 1073741824 (bytes), 0 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds On Wed, Nov 28, 2012 at 4:26 AM, Bryan Talbot btal...@aeriagames.comwrote: The row cache itself is global and the size is set with row_cache_size_in_mb. It must be enabled per CF using the proper settings. CQL3 isn't complete yet in C* 1.1 so if the cache settings aren't shown there, then you'll probably need to use cassandra-cli. -Bryan On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote: Use cassandracli. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: need some help with row cache From: Yiming Sun yiming@gmail.com To: user@cassandra.apache.org CC: Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com wrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: need some help with row cache
Does replica placement play a role in row cache hits? I happen to notice that the 3 nodes on rack 2 are the ones with no recent hit rates, even when I specify only one node from rack2 as the host to Hector. The cluster uses PropertyFileSnitch, and the nodes are alternating between rac1 and rac2 in a single Data Center clockwise on the ring. This particular column family uses NetworkTopologyStrategy, with replication factor of 2. So the idea is it can place the replica on the next node in the ring without having to walk all the around. But it seems cache hits tend to only happen on rack 1? Address DC RackStatus State Load Effective-Ownership Token 141784319550391026443072753096570088105 x.x.x.1DC1 r1 Up Normal 587.46 GB 33.33% 0 x.x.x.2DC1 r2 Up Normal 591.21 GB 33.33% 28356863910078205288614550619314017621 x.x.x.3DC1 r1 Up Normal 594.97 GB 33.33% 56713727820156410577229101238628035242 x.x.x.4DC1 r2 Up Normal 587.15 GB 33.33% 85070591730234615865843651857942052863 x.x.x.5DC1 r1 Up Normal 590.26 GB 33.33% 113427455640312821154458202477256070484 x.x.x.6DC1 r2 Up Normal 583.21 GB 33.33% 141784319550391026443072753096570088105 On Wed, Nov 28, 2012 at 9:09 AM, Yiming Sun yiming@gmail.com wrote: Thanks guys. However, after I ran the client code several times (same set of 5000 entries), still 2 of the 6 nodes show 0 hits on row cache, despite each node has 1GB capacity for row cache and the caches are full. Since I always request the same entries over and over again, shouldn't there be some hits? [user@node]$ ./checkinfo.sh Token: 85070591730234615865843651857942052863 Gossip active: true Thrift active: true Load : 587.15 GB Generation No: 1354074048 Uptime (seconds) : 36957 Heap Memory (MB) : 2027.29 / 3948.00 Data Center : DC1 Rack : r2 Exceptions : 0 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds Row Cache: size 1072651974 (bytes), capacity 1073741824 (bytes), 0 hits, 2576 requests, NaN recent hit rate, 0 save period in seconds Token: 141784319550391026443072753096570088105 Gossip active: true Thrift active: true Load : 583.21 GB Generation No: 1354074461 Uptime (seconds) : 36535 Heap Memory (MB) : 828.71 / 3948.00 Data Center : DC1 Rack : r2 Exceptions : 0 Key Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds Row Cache: size 1072602906 (bytes), capacity 1073741824 (bytes), 0 hits, 3194 requests, NaN recent hit rate, 0 save period in seconds On Wed, Nov 28, 2012 at 4:26 AM, Bryan Talbot btal...@aeriagames.comwrote: The row cache itself is global and the size is set with row_cache_size_in_mb. It must be enabled per CF using the proper settings. CQL3 isn't complete yet in C* 1.1 so if the cache settings aren't shown there, then you'll probably need to use cassandra-cli. -Bryan On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote: Use cassandracli. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: need some help with row cache From: Yiming Sun yiming@gmail.com To: user@cassandra.apache.org CC: Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com wrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: counters + replication = awful performance?
I may be wrong but during a bootstrap hints can be silently discarded, if the node they are destined for leaves the ring. There are a large number of people using counters for 5 minute real-time statistics. On the back end they use ETL based reporting to compute the true stats at a hourly or daily interval. A user like this might benefit from DANGER counters. They are not looking for perfection, only better performance, and the counter row keys themselves role over in 5 minutes anyway. Options like this are also great for winning benchmarks. When someone other NoSQL (that is not has fast as c*) wants to win a benchmark they turn off/on WAL, or write acks, or something that compromises their ACID/CAP story for the purpose of winning. We need our own secret awesome-sauce dangerous options too! jk On Wed, Nov 28, 2012 at 4:21 AM, Rob Coli rc...@palominodb.com wrote: On Tue, Nov 27, 2012 at 3:21 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 Per Sylvain on the referenced ticket : I don't disagree about the efficiency of the valve, but at what price? 'Bootstrapping a node will make you lose increments (you don't know which ones, you don't know how many and this even if nothing goes wrong)' is a pretty bad drawback. That is pretty much why that option makes me uncomfortable: it does give you better performance, so people may be tempted to use it. Now if it was only a matter of replicating writes only through read-repair/repair, then ok, it's pretty dangerous but it's rather easy to explain/understand the drawback (if you don't lose a disk, you don't lose increments, and you'd better use CL.ALL or have read_repair_chance to 1). But the fact that it doesn't work with bootstrap/move makes me wonder if having the option at all is not making a disservice to users. To me anything that can be described as will make you lose increments (you don't know which ones, you don't know how many and this even if nothing goes wrong) and which therefore doesn't work with bootstrap/move is correctly described as dangerous. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: outOfMemory error
Well, asking for 500MB of data at once for a server with such modest specs is asking for troubles. Here are my suggestions. Disable the 1 GB row cache Consider allocating that memory for the java heap Xms2500m Xmx2500m Don't fetch all the columns at once -- page through them a slice at a time Increase the memtable to more than 64 MB if you want to write data to this cluster -Bryan On Wed, Nov 28, 2012 at 5:06 AM, Damien Lejeune d.leje...@pepite.be wrote: Hi all, I'm currently experiencing a outOfMemory problem with Cassandra-1.1.6 on Windows XP-Pro (32-bit). The server crashes when I try to query it with a relatively small amount of data (around 100 rows with 5 columns each: to be precise, on my configuration, querying 75 or more rows makes the server to crash). I tried with different library (Hector, JDBC, Thrift) and with the Cassandra stress tool. All lead to the same outOfMemory problem. My dataset is composed, for each row, of: 1 column in DateType, 4 columns in DoubleType. I ran a query to fetch the entire dataset (around 330MB for the raw data + around 200MB for the metadata) and got the log at the end of this message. I also checked the heap-dump with Mat which displays these top values: Class Name Objects Shallow Heap java.nio.HeapByteBuffer 16,253,559 780,170,832 bytes[] 16,254,013 330,207,640 -- Data ? java.util.TreeMap$Entry8,126,711 260,054,752 org.apache.cassandra.db.Column 8,116,589 194,798,136 -- Metadata ? I tried to change the configuration in Cassandra for the values: - row_cache_size_in_mb: tried different value between [0,1000] MB - flush_largest_memtables_at: set to 0.1, but tried with 0.75 - reduce_cache_sizes_at: tried 0.85, 0.6, 0.2 and 0.1 - reduce_cache_capacity_to: tried 0.6 and 0.15 - memtable_total_space_in_mb: 64 MB, but also tried to disable it (- 1/3 of the heap) - Xms1G - Xmx1500M with no real observable improvements regarding my problem. My Cassandra server and client both run on the same machine. Here are the characteristics of my system configuration: - Cassandra-1.1.6 - java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing) - Windows XP-Pro 32 bits with service pack 3 - CPU double-core, 32 bits @2.26GHz - 3.48 of RAM I'm aware that my system configuration is not an optimized environment to make Cassandra to run efficiently, but I wonder if you guys know a workaround (or any idea on how) to fix this problem. Part of the answer is probably that I do not have enough RAM to run the process, but I also wonder if it is a 'normal' behaviour for Cassandra to handle this particular test case that way. Cheers, Damien Cassandra's LOG --- Starting Cassandra Server INFO 09:10:27,171 Logging initialized INFO 09:10:27,171 JVM vendor/version: Java HotSpot(TM) Client VM/1.6.0_18 INFO 09:10:27,171 Heap size: 1072103424/1569521664 INFO 09:10:27,171 Classpath:
Re: Java high-level client
+1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.comwrote: Netflix has a great client https://github.com/Netflix/astyanax
Re: Java high-level client
We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh ailin...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax
How to query secondary indexes
Hi, According to the documentation on Indexes ( http://www.datastax.com/docs/1.1/ddl/indexes ), in order to use WHERE on a column which is not part of my key, I must define a secondary index on it. However, I can only use equality comparison on it but I wish to use other comparisons methods like greater than. Let's say I have a room with people and every timestamp, I measure the temperature of the room and number of people. I use the timestamp as my key and I want to select all timestamps where temperature was over 50 degrees but I can't seem to be able to do it with a regular query even if I define that column as a secondary index. SELECT * FROM MyTable WHERE temp 50.4571; My lame workaround is to define a secondary index on NumOfPeopleInRoom and than for a specific value SELECT * FROM MyTable WHERE NumOfPeopleInRoom = 7 AND temp 50.4571; I'm pretty sure this is not the proper way for me to do this. How should I attack this? It feels like I'm missing a very basic concept. I'd appreciate it if your answers include also the option of not changing my schema. Thanks!!!
Re: How to query secondary indexes
You're going to have a problem doing this in a single query because you're asking cassandra to select a non-contiguous set of rows. Also, to my knowledge, you can only use non equal operators on clustering keys. The best solution I could come up with would be to define you table like so: CREATE TABLE room_data ( room_id uuid, in_room int, temp float, time timestamp, PRIMARY KEY (room_id, in_room, temp)); Then run 2 queries: SELECT * FROM room_data WHERE in_room 7; SELECT * FROM room_data WHERE temp 50.0; And do an intersection on the results. I should add the disclaimer that I am relatively new to CQL, so there may be a better way to do this. Blake On Wed, Nov 28, 2012 at 10:02 AM, Oren Karmi oka...@gmail.com wrote: Hi, According to the documentation on Indexes ( http://www.datastax.com/docs/1.1/ddl/indexes ), in order to use WHERE on a column which is not part of my key, I must define a secondary index on it. However, I can only use equality comparison on it but I wish to use other comparisons methods like greater than. Let's say I have a room with people and every timestamp, I measure the temperature of the room and number of people. I use the timestamp as my key and I want to select all timestamps where temperature was over 50 degrees but I can't seem to be able to do it with a regular query even if I define that column as a secondary index. SELECT * FROM MyTable WHERE temp 50.4571; My lame workaround is to define a secondary index on NumOfPeopleInRoom and than for a specific value SELECT * FROM MyTable WHERE NumOfPeopleInRoom = 7 AND temp 50.4571; I'm pretty sure this is not the proper way for me to do this. How should I attack this? It feels like I'm missing a very basic concept. I'd appreciate it if your answers include also the option of not changing my schema. Thanks!!!
Re: counters + replication = awful performance?
Well, those are sad news then. I don't think I can consider 20k increments per second for a two node cluster (with RF=2) a reasonable performance (cost vs. benefit). I might have to look into other storage solutions or perhaps experiment with duplicate clusters with RF=1 or replicate_on_write=false. Although yes, I probably should try that row cache you mentioned -- I saw that key cache was going unused (so saw no reason to try to enable row cache), but I think it was on RF=1, it might be different on RF=2. Sylvain Lebresne-3 wrote Counters replication works in different ways than the one of normal writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a degenerate case. So there is two reason why RF2 is much slower than RF=1: 1) it involves a read to replicate and that read takes times. Especially if that read hits the disk, it may even dominate the insertion time. 2) the replication to the first replica and the one to the res of the replica are not done in parallel but sequentially. Note that this is only true for the first replica versus the othere. In other words, from RF=2 to RF=3 you should see a significant performance degradation. Note that while there is nothing you can do for 2), you can try to speed up 1) by using row cache for instance (in case you weren't). In other words, with counters, it is expected that RF=1 be potentially much faster than RF1. That is the way counters works. And don't get me wrong, I'm not suggesting you should use RF=1 at all. What I am saying is that the performance you see with RF=2 is the performance of counters in Cassandra. -- Sylvain On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir lt; solf.lists@ gt; wrote: I think there might be a misunderstanding as to the nature of the problem. Say, I have test set T. And I have two identical servers A and B. - I tested that server A (singly) is able to handle load of T. - I tested that server B (singly) is able to handle load of T. - I then join A and B in the cluster and set replication=2 -- this means that each server in effect has to handle full test load individually (because there are two servers and replication=2 it means that each server effectively has to handle all the data written to the cluster). Under these circumstances it is reasonable to assume that cluster A+B shall be able to handle load T because each server is able to do so individually. HOWEVER, this is not the case. In fact, A+B together are only able to handle less than 1/3 of T DESPITE the fact that A and B individually are able to handle T just fine. I think there's something wrong with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down. To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of scala out don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka.
Re: Java high-level client
First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei -- *From:* Andrey Ilinykh ailin...@gmail.com *To:* user@cassandra.apache.org *Sent:* Wednesday, November 28, 2012 9:37 AM *Subject:* Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax
Re: counters + replication = awful performance?
Just for reference HBase's counters also do a local read. I am not saying they work better/worse/faster/slower but I would not suspect any system that reads on increment to me significantly faster then what Cassandra does. Just saying your counter throughput is read bound, this is not unique to C*'s implementation. On Wed, Nov 28, 2012 at 2:41 PM, Sergey Olefir solf.li...@gmail.com wrote: Well, those are sad news then. I don't think I can consider 20k increments per second for a two node cluster (with RF=2) a reasonable performance (cost vs. benefit). I might have to look into other storage solutions or perhaps experiment with duplicate clusters with RF=1 or replicate_on_write=false. Although yes, I probably should try that row cache you mentioned -- I saw that key cache was going unused (so saw no reason to try to enable row cache), but I think it was on RF=1, it might be different on RF=2. Sylvain Lebresne-3 wrote Counters replication works in different ways than the one of normal writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a degenerate case. So there is two reason why RF2 is much slower than RF=1: 1) it involves a read to replicate and that read takes times. Especially if that read hits the disk, it may even dominate the insertion time. 2) the replication to the first replica and the one to the res of the replica are not done in parallel but sequentially. Note that this is only true for the first replica versus the othere. In other words, from RF=2 to RF=3 you should see a significant performance degradation. Note that while there is nothing you can do for 2), you can try to speed up 1) by using row cache for instance (in case you weren't). In other words, with counters, it is expected that RF=1 be potentially much faster than RF1. That is the way counters works. And don't get me wrong, I'm not suggesting you should use RF=1 at all. What I am saying is that the performance you see with RF=2 is the performance of counters in Cassandra. -- Sylvain On Wed, Nov 28, 2012 at 7:34 AM, Sergey Olefir lt; solf.lists@ gt; wrote: I think there might be a misunderstanding as to the nature of the problem. Say, I have test set T. And I have two identical servers A and B. - I tested that server A (singly) is able to handle load of T. - I tested that server B (singly) is able to handle load of T. - I then join A and B in the cluster and set replication=2 -- this means that each server in effect has to handle full test load individually (because there are two servers and replication=2 it means that each server effectively has to handle all the data written to the cluster). Under these circumstances it is reasonable to assume that cluster A+B shall be able to handle load T because each server is able to do so individually. HOWEVER, this is not the case. In fact, A+B together are only able to handle less than 1/3 of T DESPITE the fact that A and B individually are able to handle T just fine. I think there's something wrong with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down. To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of scala out don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :)
Re: Java high-level client
Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, November 28, 2012 11:49 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Subject: Re: Java high-level client First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Java high-level client
Astyanax was the son of Hector who was Cassandra's brother in greek mythology. So son is doing better than the father:) -Wei From: Michael Kjellman mkjell...@barracuda.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Wednesday, November 28, 2012 11:51 AM Subject: Re: Java high-level client Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh ailin...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday, November 28, 2012 11:49 AM To: user@cassandra.apache.org user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com Subject: Re: Java high-level client First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh ailin...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax -- 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Java high-level client
Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see the fork as necessary. It has split up the community a bit, as there are now 3 high level Java clients. I would advice follow Josh's advice http://www.youtube.com/watch?v=nPG4sK_glls . Go to reddit and select whatever sexy technology is new and trending :) On Wed, Nov 28, 2012 at 2:51 PM, Michael Kjellman mkjell...@barracuda.comwrote: Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh ailin...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday, November 28, 2012 11:49 AM To: user@cassandra.apache.org user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com Subject: Re: Java high-level client First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei -- *From:* Andrey Ilinykh ailin...@gmail.com *To:* user@cassandra.apache.org *Sent:* Wednesday, November 28, 2012 9:37 AM *Subject:* Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax -- 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Java high-level client
Well, not really. Astyanax ('astu-wanax' in mycenaean greek, 'lord of the city') has his brains dashed out against the walls of troy by Neoptolemus, son of Achilles. So the suck was universal. --DRS, possibly the only trained classicist using big cassandra databases :) On Nov 28, 2012, at 12:19 PM, Wei Zhu wz1...@yahoo.com wrote: Astyanax was the son of Hector who was Cassandra's brother in greek mythology. So son is doing better than the father:) -Wei From: Michael Kjellman mkjell...@barracuda.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: Wednesday, November 28, 2012 11:51 AM Subject: Re: Java high-level client Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh ailin...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday, November 28, 2012 11:49 AM To: user@cassandra.apache.org user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com Subject: Re: Java high-level client First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh ailin...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax -- 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Java high-level client
CQL Datastax Java Driver for the win then... On Nov 28, 2012, at 12:25 PM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote: Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see the fork as necessary. It has split up the community a bit, as there are now 3 high level Java clients. I would advice follow Josh's advice http://www.youtube.com/watch?v=nPG4sK_glls . Go to reddit and select whatever sexy technology is new and trending :) On Wed, Nov 28, 2012 at 2:51 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, November 28, 2012 11:49 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Subject: Re: Java high-level client First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com wrote: We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax -- 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Generic questions over Cassandra 1.1/1.2
Compact storage is the schemaless of old. Right. That comes with the downside of picking one :) It does not seem the compact storage is the default choice for the the future. As well as interop with the thrift/cli world, I also find it hard to reason about row caching with CQL defined tables. I still work through thrift/cli as a result, which is a pity because CQL has a nice surface. Bill On 28/11/12 01:32, Edward Capriolo wrote: @Bill Are you saying that now cassandra is less schema less ? :) Compact storage is the schemaless of old. On Tuesday, November 27, 2012, Bill de hÓra b...@dehora.net mailto:b...@dehora.net wrote: I'm not sure I always understand what people mean by schema less exactly and I'm curious. For 'schema less', given this - {{{ cqlsh use example; cqlsh:example CREATE TABLE users ( ... user_name varchar, ... password varchar, ... gender varchar, ... session_token varchar, ... state varchar, ... birth_year bigint, ... PRIMARY KEY (user_name) ... ); }}} I expect this would not cause an unknown identifier error - {{{ INSERT INTO users (user_name, password, extra, moar) VALUES ('bob', 'secret', 'a', 'b'); }}} but definitions vary. Bill On 26/11/12 09:18, Sylvain Lebresne wrote: On Mon, Nov 26, 2012 at 8:41 AM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com mailto:aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: Is there any noticeable performance difference between thrift or CQL3? Off the top of my head it's within 5% (maybe 10%) under stress tests. See Eric's talk at the Cassandra SF conference for the exact numbers. Eric's benchmark results was that normal queries were slightly slower but prepared one (and in real life, I see no good reason not to prepare statements) were actually slightly faster. CQL 3 requires a schema, however altering the schema is easier. And in 1.2 will support concurrent schema modifications. Thrift API is still schema less. Sorry to hijack this thread, but I'd be curious (like seriously, I'm not trolling) to understand what you mean by CQL 3 requires a schema but Thrift API is still schema less. Basically I'm not sure I always understand what people mean by schema less exactly and I'm curious. -- Sylvain
Re: How to determine compaction bottlenecks
aaron morton aaron at thelastpickle.com writes: I've been playing around with trying to figure out what is making compactions run so slow.Is this regular compaction or table upgrades ? I *think* upgrade tables is single threaded. Do you have some compaction logs lines that say Compacted to…? It's handy to see the throughput and the number of keys compacted. snapshot_before_compaction: falsein_memory_compaction_limit_in_mb: 256multithreaded_compaction: truecompaction_throughput_mb_per_sec: 128compaction_preheat_key_cache: true What setting for concurrent_compactors ? I would also check the logs for GC issues. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand at aaronmorton http://www.thelastpickle.com On 28/11/2012, at 4:23 AM, Derek Bromenshenkel derek.bromenshenkel at gmail.com wrote: Setup: C* 1.1.6, 6 node (Linux, 64GB RAM, 16 Core CPU, 2x512 SSD), RF=3, 1.65TB total usedBackground: Client app is off - no reads/writes happening. Doing some cluster maintenance requiring node repairs and upgradesstables.I've been playing around with trying to figure out what is making compactions run so slow. Watching syslogs, it seems to average 3-4MB/s. That just seems so slow for this set up and the fact there is zero external load on the cluster. As far as I can tell:1. Not I/O bound according to iostat data 2. CPU seems to be idiling also3. From my understanding, I am using all the correct compaction settings for this setup: Here are those below:snapshot_before_compaction: falsein_memory_compaction_limit_in_mb: 256multithreaded_compaction: truecompaction_throughput_mb_per_sec: 128compaction_preheat_key_cache: trueSome other thoughts:- I have turned on DEBUG logging for the Throttle class and played with the live compaction_throughput_mb_per_sec setting. I can see it performing the throttling if I set the value low (say 4), but anything over 8 it is apparently running wide open. [Side note: Although the math for the Throttle class adds up, over all the throttling seems to be very very conservative.]- I accidently turned on DEBUG for the entire ...compaction.* package and that unintentionally created A LOT of I/O from the ParallelCompactionIterable class, and the disk/OS handled that just fine.Perhaps I just don't fully grasp what is going on or have the correct expectations. I am OK with things being slow if the hardware is working hard, but that does not seem to be the case.Anyone have some insight?Thanks Hi Aaron, Thank you for taking the time and responding. I'll try to answer your questions. - reg vs upgrade: Seeing the same speed on regular compaction and upgrades. True that most of the frustration comes from the upgrades since there is so much work to be done. - GC: looked fine. I've seen pressure before, but only when under very heavy client app load. - concurrent_compactors: not set, so should default to #cores [32; 16 phys * 2 hyperthread], and I did see 32 CompactionExecuter (I think) threads via JMX. - examples: yes I have a lot of examples. here are some Leveled INFO [CompactionExecutor:1033] 2012-11-26 01:38:56,800 CompactionTask.java (line 221) Compacted to [lcs1]. 35,058,450 to 33,408,896 (~95% of original) bytes for 127,771 keys at 4.371735MB/s. Time: 7,288ms. INFO [CompactionExecutor:2015] 2012-11-26 03:12:43,800 CompactionTask.java (line 221) Compacted to [lcs2]. 37,029,581 to 36,747,459 (~99% of original) bytes for 135,471 keys at 3.748541MB/s. Time: 9,349ms. Size Tiered INFO [CompactionExecutor:6242] 2012-11-26 10:46:24,030 CompactionTask.java (line 221) Compacted to [abc]. 12,804,781,130 to 5,575,340,207 (~43% of original) bytes for 84,723,404 keys at 1.382544MB/s. Time: 3,845,851ms. INFO [CompactionExecutor:288] 2012-11-26 00:42:58,629 CompactionTask.java (line 221) Compacted to [def]. 116,347,764 to 58,354,237 (~50% of original) bytes for 2,511,375 keys at 0.655612MB/s. Time: 84,884ms. INFO [CompactionExecutor:5113] 2012-11-26 08:33:12,885 CompactionTask.java (line 221) Compacted to [ghi]. 560,682,371 to 294,965,985 (~52% of original) bytes for 220 keys at 3.172669MB/s. Time: 88,664ms. INFO [CompactionExecutor:6124] 2012-11-26 09:36:52,141 CompactionTask.java (line 221) Compacted to [jkl]. 418,807,103 to 234,394,618 (~55% of original) bytes for 3,130,751 keys at 2.807220MB/s. Time: 79,629ms. Also, upon reading the messages here/JIRA/etc I decided to disable multithreaded_compaction late yesterday. That helped to the tune of 3-5x improvement. Why multi is so much slower I'm willing to digress for now. However, I'm still interested in understanding why, under zero load in an unthrottled state, the compaction process does not consume at least one full CPU core and/or max out the disk I/O. Thanks again, Derek
Re: counters + replication = awful performance?
On Wed, Nov 28, 2012 at 7:15 AM, Edward Capriolo edlinuxg...@gmail.com wrote: I may be wrong but during a bootstrap hints can be silently discarded, if the node they are destined for leaves the ring. Yeah : https://issues.apache.org/jira/browse/CASSANDRA-2434 A user like this might benefit from DANGER counters. They are not looking for perfection, only better performance, and the counter row keys themselves role over in 5 minutes anyway. Yep, I agree that if you don't care about accurate counting, Cassandra counters may be for you. Cassandra counters in mongo mode are even more web scale! The unfortunate thing is that people seem to assume that software does what it is supposed to do, and probably do not get a great impression of said software when it doesn't. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb