Re: Best way to do a multi_get using CQL
Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg = query % tuple(args) futures.append((query_msg, self.session.execute_async(query, args))) for query_msg, future in futures: try: rows = future.result(timeout=10) for row in rows: entity_ids.add(row.entity_id) except: logging.error(Query '%s' returned ERROR % (query_msg)) raise Using async just with select = would mean instead of 1 async query (example: in (0, 1, 2)), I would do several, one for each value of values array above. In my head, this would mean more connections to Cassandra and the same amount of work, right? What would be the advantage? []s 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Among then is SELECT ... IN or index lookups¶ SELECT ... IN and index lookups (formerly secondary indexes) should be avoided except for specific scenarios. See When not to use IN in SELECT and When not to use an index in Indexing in CQL for Cassandra 2.0 And Looking at the SELECT doc, I saw: When not to use IN¶ The recommendations about when not to use an index apply to using IN in the WHERE clause. Under most conditions, using IN in the WHERE clause is not recommended. Using IN can degrade performance because usually many nodes must be queried. For example, in a single, local data center cluster having 30 nodes, a replication factor of 3, and a consistency level of LOCAL_QUORUM, a single key query goes out to two nodes, but if the query uses the IN condition, the number of nodes being queried are most likely even higher, up to 20 nodes depending on where the keys fall in the token range. In my system, I have a column family called entity_lookup: CREATE KEYSPACE IF NOT EXISTS Identification1 WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 }; USE Identification1; CREATE TABLE IF NOT EXISTS entity_lookup ( name varchar, value varchar, entity_id uuid, PRIMARY KEY ((name, value), entity_id)); And I use the following select to query it: SELECT entity_id FROM entity_lookup WHERE name=%s and value in(%s) Is this an anti-pattern? If not using SELECT IN, which other way would you recomend for lookups like that? I have
Sending BLOBs to Cassandra +
Hi, I read in Cassandra's FAQ that it is fine with BLOBs up to 64MB. Here am I trying to send a 1.6MB BLOB using CQL and Cassandra rejects my query with the following message: Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Request is too big: length 409600086 exceeds maximum allowed length 268435456. Is there a better way than CQL to send BLOBS? Is there a way to concat them so that I can use several queries of the right size to upload my BLOB? I'd like to be able to send BLOBs up to 20MB. Thanks! -- Simon
Bug on 2.1-rc1 with BLOBs?
Hi, When I am sending BLOBs _below_ the max query size (blob size=0.6MB), on Cassandra 2.0, it works fine, but on 2.1-rc1 I get the following error within the Cassandra server (from the logs) and the query just dies: WARN [SharedPool-Worker-2] 2014-06-20 10:06:00,263 AbstractTracingAwareExecutorService.java:166 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {} java.lang.RuntimeException: java.lang.IllegalArgumentException: Mutation of 122880122 bytes is too large for the maxiumum size of 16777216 at org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2052) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_05] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:162) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:103) [apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_05] Caused by: java.lang.IllegalArgumentException: Mutation of 122880122 bytes is too large for the maxiumum size of 16777216 at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:205) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:192) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:374) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:354) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.db.Mutation.apply(Mutation.java:210) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.service.StorageProxy$7.runMayThrow(StorageProxy.java:958) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] at org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2048) ~[apache-cassandra-2.1.0-rc1.jar:2.1.0-rc1] ... 4 common frames omitted I checked on JIRA to see something similar but didn't find (then again, not sure which keywords to search for). Should I open an issue? Cheers, Simon
Re: Bug on 2.1-rc1 with BLOBs?
Hi Simon, On 20/06/14 10:18, Simon Chemouil wrote: Hi, When I am sending BLOBs _below_ the max query size (blob size=0.6MB), on Cassandra 2.0, it works fine, but on 2.1-rc1 I get the following error within the Cassandra server (from the logs) and the query just dies: WARN [SharedPool-Worker-2] 2014-06-20 10:06:00,263 AbstractTracingAwareExecutorService.java:166 - Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {} java.lang.RuntimeException: java.lang.IllegalArgumentException: Mutation of 122880122 bytes is too large for the maxiumum size of 16777216 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Ciao, Duncan.
Re: Bug on 2.1-rc1 with BLOBs?
Le 20/06/2014 10:41, Duncan Sands a écrit : Hi Simon, 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Turns out there was a mistake in my code. The blob in this case was actually 122MB! Still the same code works fine on Cassandra 2.0.x so there might be a bug lurking. Even if it's definitely above the recommended limit. Simon
Re: Sending BLOBs to Cassandra +
So looks like I was sending more than I expected. Still the question stands: is CQL the best way to send BLOBs? Are there any remote operations available on BLOBs? Thanks, Simon Le 20/06/2014 10:03, Simon Chemouil a écrit : Hi, I read in Cassandra's FAQ that it is fine with BLOBs up to 64MB. Here am I trying to send a 1.6MB BLOB using CQL and Cassandra rejects my query with the following message: Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Request is too big: length 409600086 exceeds maximum allowed length 268435456. Is there a better way than CQL to send BLOBS? Is there a way to concat them so that I can use several queries of the right size to upload my BLOB? I'd like to be able to send BLOBs up to 20MB. Thanks!
Re: Bug on 2.1-rc1 with BLOBs?
For the record, I could reproduce the problem with blobs of size below 64MB. Caused by: java.lang.IllegalArgumentException: Mutation of 32000122 bytes is too large for the maxiumum size of 16777216 32000122 is just ~30MB and fails on 2.1-rc1 while it works on 2.0.X for even larger values (up to 64MB works fine) Simon Le 20/06/2014 11:00, Simon Chemouil a écrit : Le 20/06/2014 10:41, Duncan Sands a écrit : Hi Simon, 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Turns out there was a mistake in my code. The blob in this case was actually 122MB! Still the same code works fine on Cassandra 2.0.x so there might be a bug lurking. Even if it's definitely above the recommended limit. Simon
Re: Bug on 2.1-rc1 with BLOBs?
OK, so Cassandra 2.1 now rejects writes it considers too big. It is possible to increase the value by changing commitlog_segment_size_in_mb in cassandra.yaml. It defaults to 32MB, and the maximum segment size for a write is half that value: from CommitLog.java: // we only permit records HALF the size of a commit log, to ensure we don't spin allocating many mostly // empty segments when writing large records private static final long MAX_MUTATION_SIZE = DatabaseDescriptor.getCommitLogSegmentSize() 1; which explains (with the request overhead) why my ~30,5MB blob was rejected. Simon Le 20/06/2014 11:24, Simon Chemouil a écrit : For the record, I could reproduce the problem with blobs of size below 64MB. Caused by: java.lang.IllegalArgumentException: Mutation of 32000122 bytes is too large for the maxiumum size of 16777216 32000122 is just ~30MB and fails on 2.1-rc1 while it works on 2.0.X for even larger values (up to 64MB works fine) Simon Le 20/06/2014 11:00, Simon Chemouil a écrit : Le 20/06/2014 10:41, Duncan Sands a écrit : Hi Simon, 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Turns out there was a mistake in my code. The blob in this case was actually 122MB! Still the same code works fine on Cassandra 2.0.x so there might be a bug lurking. Even if it's definitely above the recommended limit. Simon
Re: Best practices for repair
Thank you very much, I recompiled it with 2.0 and it works well, now I will try to figure out which granularity works better. Your example was really a boost, thanks again! Regards, Paolo Il 19/06/2014 22:42, Paulo Ricardo Motta Gomes ha scritto: Hello Paolo, I just published an open source version of the dsetool list_subranges command, which will enable you to perform subrange repair as described in the post. You can find the code and usage instructions here: https://github.com/pauloricardomg/cassandra-list-subranges Currently available for 1.2.16, but I guess that just changing the version on the pom.xml and recompiling it will make it work on 2.0.x. Cheers, Paulo On Thu, Jun 19, 2014 at 4:40 PM, Jack Krupansky j...@basetechnology.com mailto:j...@basetechnology.com wrote: The DataStax doc should be current best practices: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html If you or anybody else finds it inadequate, speak up. -- Jack Krupansky -Original Message- From: Paolo Crosato Sent: Thursday, June 19, 2014 10:13 AM To: user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Best practices for repair Hi eveybody, we have some problems running repairs on a timely schedule. We have a three node deployment, and we start repair on one node every week, repairing one columnfamily by one. However, when we run into the big column families, usually repair sessions hangs undefinitely, and we have to restart them manually. The script runs commands like: nodetool repair keyspace columnfamily one by one. This has not been a major issue for some time, since we never delete data, however we would like to sort the issue once and for all. Reading resources on the net, I came to the conclusion that we could: 1) either run a repair sessione like the one above, but with the -pr switch, and run it on every node, not just on one 2) or run sub range repair as described here http://www.datastax.com/dev/blog/advanced-repair-techniques , which would be the best option. However the latter procedure would require us to write some java program that calls describe_splits to get the tokens to feed nodetool repair with. The second procedure is available out of the box only in the commercial version of the opscenter, is this true? I would like to know if these are the current best practices for repairs or if there is some other option that makes repair easier to perform, and more reliable that it is now. Regards, Paolo Crosato -- Paolo Crosato Software engineer/Custom Solutions e-mail: paolo.cros...@targaubiest.com mailto:paolo.cros...@targaubiest.com -- *Paulo Motta* Chaordic | /Platform/ _www.chaordic.com.br http://www.chaordic.com.br/_ +55 48 3232.3200 -- Paolo Crosato Software engineer/Custom Solutions e-mail: paolo.cros...@targaubiest.com Office phone: +3904221722825 UBIEST S.p.A. www.ubiest.com Via E. Reginato, 85/H - 31100 Treviso- ITALY Tel [+39] 0422 210 194 - Fax [+39] 0422 210 270 This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.
Re: Best way to do a multi_get using CQL
However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg = query % tuple(args) futures.append((query_msg, self.session.execute_async(query, args))) for query_msg, future in futures: try: rows = future.result(timeout=10) for row in rows: entity_ids.add(row.entity_id) except: logging.error(Query '%s' returned ERROR % (query_msg)) raise Using async just with select = would mean instead of 1 async query (example: in (0, 1, 2)), I would do several, one for each value of values array above. In my head, this would mean more connections to Cassandra and the same amount of work, right? What would be the advantage? []s 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Among then is SELECT ... IN or index lookups¶ SELECT ... IN and index lookups (formerly secondary indexes) should be avoided except for specific scenarios. See When not to use IN in SELECT and When not to use an index in Indexing in CQL for Cassandra 2.0 And Looking at the SELECT doc, I saw: When not to use IN¶ The recommendations about when not to use an index apply to using IN in the WHERE clause. Under most conditions, using IN in the WHERE clause is not recommended. Using IN can degrade performance because usually many nodes must be queried. For example, in a single, local data center cluster having 30 nodes, a replication factor of 3, and a consistency level of LOCAL_QUORUM, a single key query goes out to two nodes, but if the query uses the IN condition, the number of nodes being queried are most likely even higher, up to 20 nodes depending on where the keys fall in the token range. In my system, I have a column family called entity_lookup: CREATE KEYSPACE IF NOT
Re: Batch of prepared statements exceeding specified threshold
The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Bug on 2.1-rc1 with BLOBs?
Thanks Simon for the info. I didn't know that the maximum payload size is related to commit log config, interesting ... On Fri, Jun 20, 2014 at 11:39 AM, Simon Chemouil schemo...@gmail.com wrote: OK, so Cassandra 2.1 now rejects writes it considers too big. It is possible to increase the value by changing commitlog_segment_size_in_mb in cassandra.yaml. It defaults to 32MB, and the maximum segment size for a write is half that value: from CommitLog.java: // we only permit records HALF the size of a commit log, to ensure we don't spin allocating many mostly // empty segments when writing large records private static final long MAX_MUTATION_SIZE = DatabaseDescriptor.getCommitLogSegmentSize() 1; which explains (with the request overhead) why my ~30,5MB blob was rejected. Simon Le 20/06/2014 11:24, Simon Chemouil a écrit : For the record, I could reproduce the problem with blobs of size below 64MB. Caused by: java.lang.IllegalArgumentException: Mutation of 32000122 bytes is too large for the maxiumum size of 16777216 32000122 is just ~30MB and fails on 2.1-rc1 while it works on 2.0.X for even larger values (up to 64MB works fine) Simon Le 20/06/2014 11:00, Simon Chemouil a écrit : Le 20/06/2014 10:41, Duncan Sands a écrit : Hi Simon, 122880122 bytes is a lot more than 0.6MB... How are you sending your blob? Turns out there was a mistake in my code. The blob in this case was actually 122MB! Still the same code works fine on Cassandra 2.0.x so there might be a bug lurking. Even if it's definitely above the recommended limit. Simon
Re: Best way to do a multi_get using CQL
I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg = query % tuple(args) futures.append((query_msg, self.session.execute_async(query, args))) for query_msg, future in futures: try: rows = future.result(timeout=10) for row in rows: entity_ids.add(row.entity_id) except: logging.error(Query '%s' returned ERROR % (query_msg)) raise Using async just with select = would mean instead of 1 async query (example: in (0, 1, 2)), I would do several, one for each value of values array above. In my head, this would mean more connections to Cassandra and the same amount of work, right? What would be the advantage? []s 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Among then is SELECT ... IN or index lookups¶ SELECT ... IN and index lookups (formerly secondary indexes) should be avoided except for specific scenarios. See When not to use IN in SELECT and When not to use an index in Indexing in CQL for Cassandra 2.0 And Looking at the SELECT doc, I saw: When not to use IN¶ The recommendations about when not to use
Re: Batch of prepared statements exceeding specified threshold
Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Bug on 2.1-rc1 with BLOBs?
On Fri, Jun 20, 2014 at 2:39 AM, Simon Chemouil schemo...@gmail.com wrote: OK, so Cassandra 2.1 now rejects writes it considers too big. It is possible to increase the value by changing commitlog_segment_size_in_mb in cassandra.yaml. It defaults to 32MB, and the maximum segment size for a write is half that value: The previous behavior, IIRC, was to just not commitlog the gigantic thing... so that's probably a good change. :) =Rob
Re: Best way to do a multi_get using CQL
A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg = query % tuple(args) futures.append((query_msg, self.session.execute_async(query, args))) for query_msg, future in futures: try: rows = future.result(timeout=10) for row in rows: entity_ids.add(row.entity_id) except: logging.error(Query '%s' returned ERROR % (query_msg)) raise Using async just with select = would mean instead of 1 async query (example: in (0, 1, 2)), I would do several, one for each value of values array above. In my head, this would mean more connections to Cassandra and the same amount of work, right? What would be the advantage? []s 2014-06-19 22:01 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
Re: Best way to do a multi_get using CQL
That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg = query % tuple(args) futures.append((query_msg, self.session.execute_async(query, args))) for query_msg, future in futures: try: rows = future.result(timeout=10) for row in rows: entity_ids.add(row.entity_id) except: logging.error(Query '%s' returned ERROR % (query_msg)) raise Using async just with select = would mean instead of 1 async query (example: in (0, 1, 2)), I would do several, one for each value of values array above. In my head, this would mean more
Re: Batch of prepared statements exceeding specified threshold
Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Custom snitch classpath?
Where do I add my custom snitch JAR to the Cassandra classpath so I can use it?
Re: Batch of prepared statements exceeding specified threshold
If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Custom snitch classpath?
The lib directory (where all the other jars are). bin/cassandra.in.sh does this: for jar in $CASSANDRA_HOME/lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma jer...@barchart.com wrote: Where do I add my custom snitch JAR to the Cassandra classpath so I can use it? -- Tyler Hobbs DataStax http://datastax.com/
Re: Batch of prepared statements exceeding specified threshold
I think some figures from nodetool tpstats and nodetool compactionstats may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Custom snitch classpath?
Sharing in case anyone else wants to use this: https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java Basically it is a proxy that attempts to use GossipingPropertyFileSnitch, and it that fails to initialize due to missing rack or datacenter values, it falls back to Ec2MultiRegionSnitch. We are using it for hybrid cloud deployments between AWS and our private datacenter. On Fri, Jun 20, 2014 at 1:04 PM, Tyler Hobbs ty...@datastax.com wrote: The lib directory (where all the other jars are). bin/cassandra.in.sh does this: for jar in $CASSANDRA_HOME/lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma jer...@barchart.com wrote: Where do I add my custom snitch JAR to the Cassandra classpath so I can use it? -- Tyler Hobbs DataStax http://datastax.com/
Re: Custom snitch classpath?
This is nice! I was looking for something like this to implement a multi DC cluster between OVh and Amazon. Thanks for sharing! []s 2014-06-20 15:35 GMT-03:00 Jeremy Jongsma jer...@barchart.com: Sharing in case anyone else wants to use this: https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java Basically it is a proxy that attempts to use GossipingPropertyFileSnitch, and it that fails to initialize due to missing rack or datacenter values, it falls back to Ec2MultiRegionSnitch. We are using it for hybrid cloud deployments between AWS and our private datacenter. On Fri, Jun 20, 2014 at 1:04 PM, Tyler Hobbs ty...@datastax.com wrote: The lib directory (where all the other jars are). bin/cassandra.in.sh does this: for jar in $CASSANDRA_HOME/lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done On Fri, Jun 20, 2014 at 12:58 PM, Jeremy Jongsma jer...@barchart.com wrote: Where do I add my custom snitch JAR to the Cassandra classpath so I can use it? -- Tyler Hobbs DataStax http://datastax.com/
Re: Best way to do a multi_get using CQL
I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me, as sometimes you want to work with small rows. It's no surprise Cassandra performs better when we use average row sizes. But honestly I disagree with this part of Cassandra/Driver's design. []s 2014-06-20 14:37 GMT-03:00 Jeremy Jongsma jer...@barchart.com: That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node with the data. On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s) for name, values in identifiers.items(): query = self.SELECT_ENTITY_LOOKUP % ('%s', ','.join(['%s']*len(values))) args = [name] + values query_msg =
Re: Best way to do a multi_get using CQL
Well it's kind of a trade-off. Either you send data directly to the primary replica nodes to take advantage of data-locality using token-aware strategy and the price to pay is a high number of opened connections from client side. Or you just batch data to a random node playing the coordinator role to dispatch requests to the right nodes. The price to pay is then spike load on 1 node (the coordinator) and intra-cluster bandwdith usage. The choice is yours, it has nothing to do with good or bad design. On Fri, Jun 20, 2014 at 8:55 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me, as sometimes you want to work with small rows. It's no surprise Cassandra performs better when we use average row sizes. But honestly I disagree with this part of Cassandra/Driver's design. []s 2014-06-20 14:37 GMT-03:00 Jeremy Jongsma jer...@barchart.com: That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will still be huge, right... Wouldn't it be a problem? Or when you use async the driver reuses the connection? []s 2014-06-19 22:16 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as a proxy. It's faster to just ask the node
Re: Best way to do a multi_get using CQL
There is nothing preventing that in Cassandra, it's just a matter of how intelligent the driver API is. Submit a feature request to Astyanax or Datastax driver projects. On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: The bad design part (just my opinion, no intention to offend) is not allow the possibility of sending batches directly to the data nodes, without using a coordinator. I would choose that option. []s 2014-06-20 16:05 GMT-03:00 DuyHai Doan doanduy...@gmail.com: Well it's kind of a trade-off. Either you send data directly to the primary replica nodes to take advantage of data-locality using token-aware strategy and the price to pay is a high number of opened connections from client side. Or you just batch data to a random node playing the coordinator role to dispatch requests to the right nodes. The price to pay is then spike load on 1 node (the coordinator) and intra-cluster bandwdith usage. The choice is yours, it has nothing to do with good or bad design. On Fri, Jun 20, 2014 at 8:55 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me, as sometimes you want to work with small rows. It's no surprise Cassandra performs better when we use average row sizes. But honestly I disagree with this part of Cassandra/Driver's design. []s 2014-06-20 14:37 GMT-03:00 Jeremy Jongsma jer...@barchart.com: That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back in whatever order, and sort the result. I would rather keep my app code simple. But the real point is that you should benchmark in your own environment. ml On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Yes, I am using the CQL datastax drivers. It was a good advice, thanks a lot Janathan. []s 2014-06-20 0:28 GMT-03:00 Jonathan Haddad j...@jonhaddad.com: The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming you're using one of the datastax drivers for CQL, btw. Jon On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a huge cluster. After all, the amount of connections to Cassandra will
Re: Best way to do a multi_get using CQL
I forgot to add that each connection can handle multiple simultaneous queries. This was part of the original protocol as of C* 1.2: http://www.datastax.com/dev/blog/binary-protocol Asynchronous: each connection can handle more than one active request at the same time. In practice, this means that client libraries will only need to maintain a relatively low amount of open connections to a given Cassandra node to achieve good performance. This particularly matters with Cassandra where a client usually wants to keep connection to all (or at least a good part of) the nodes of the Cluster and so having a low number of per-node connections helps scaling to large clusters. Technically, this is achieved by giving each messages a stream ID, and by having responses to a request preserve the request’s stream ID. Clients can thus send multiple requests with different stream IDs on the same connection (i.e. without waiting for the response to a request to send the next one) while still being able to associate each received response to the right request, even if said responses comes in a different order than the one in which requests were submitted. That asynchronicity is of course optional in the sense that a client library can still choose to use the protocol in a synchronous way if that is simpler. On Fri, Jun 20, 2014 at 12:30 PM, Jeremy Jongsma jer...@barchart.com wrote: There is nothing preventing that in Cassandra, it's just a matter of how intelligent the driver API is. Submit a feature request to Astyanax or Datastax driver projects. On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: The bad design part (just my opinion, no intention to offend) is not allow the possibility of sending batches directly to the data nodes, without using a coordinator. I would choose that option. []s 2014-06-20 16:05 GMT-03:00 DuyHai Doan doanduy...@gmail.com: Well it's kind of a trade-off. Either you send data directly to the primary replica nodes to take advantage of data-locality using token-aware strategy and the price to pay is a high number of opened connections from client side. Or you just batch data to a random node playing the coordinator role to dispatch requests to the right nodes. The price to pay is then spike load on 1 node (the coordinator) and intra-cluster bandwdith usage. The choice is yours, it has nothing to do with good or bad design. On Fri, Jun 20, 2014 at 8:55 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I am using python + CQL Driver. I wonder how they do... These things seems little important, but they are fundamental to get a good performance in Cassandra... I wish there was a simpler way to query in batches. Opening a large amount of connections and sending 1 message at a time seems bad to me, as sometimes you want to work with small rows. It's no surprise Cassandra performs better when we use average row sizes. But honestly I disagree with this part of Cassandra/Driver's design. []s 2014-06-20 14:37 GMT-03:00 Jeremy Jongsma jer...@barchart.com: That depends on the connection pooling implementation in your driver. Astyanax will keep N connections open to each node (configurable) and route each query in a separate message over an existing connection, waiting until one becomes available if all are in use. On Fri, Jun 20, 2014 at 12:32 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: A question, not sure if you guys know the answer: Supose I async query 1000 rows using token aware and suppose I have 10 nodes. Suppose also each node would receive 100 row queries each. How does async work in this case? Would it send each row query to each node in a different connection? Different message? I guess if there was a way to use batch with async, once you commit the batch for the 1000 queries, it would create 1 connection to each host and query 100 rows in a single message to each host. This would decrease resource usage, am I wrong? []s 2014-06-20 12:12 GMT-03:00 Jeremy Jongsma jer...@barchart.com: I've found that if you have any amount of latency between your client and nodes, and you are executing a large batch of queries, you'll usually want to send them together to one node unless execution time is of no concern. The tradeoff is resource usage on the connected node vs. time to complete all the queries, because you'll need fewer client - node network round trips. With large numbers of queries you will still want to make sure you split them into manageable batches before sending them, to control memory usage on the executing node. I've been limiting queries to batches of 100 keys in scenarios like this. On Fri, Jun 20, 2014 at 5:59 AM, Laing, Michael michael.la...@nytimes.com wrote: However my extensive benchmarking this week of the python driver from master shows a performance decrease when using 'token_aware'. This is on 12-node,
Re: Batch of prepared statements exceeding specified threshold
Logged batch. On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote: I think some figures from nodetool tpstats and nodetool compactionstats may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best regards, Marcelo Valle.
Re: Batch of prepared statements exceeding specified threshold
Ok, in my case it was straightforward. It is just warning, which however says that batches with large data size (above 5Kb) can sometimes lead to node instability (why?). This limit seems to be hard-coded, I didn't find anyway to configure it externally. Anyway, removing batch and giving up atomicity, resolved the issue for me. http://mail-archives.apache.org/mod_mbox/cassandra-commits/201404.mbox/%3ceee5dd5bc4794ef0b5c5153fdb583...@git.apache.org%3E On Fri, Jun 20, 2014 at 3:55 PM, Pavel Kogan pavel.ko...@cortica.com wrote: Logged batch. On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan doanduy...@gmail.com wrote: I think some figures from nodetool tpstats and nodetool compactionstats may help seeing clearer And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch ? On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: If you have 32 Gb RAM, the heap is probably 8Gb. 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all writes of a replica goes to a single node. I really don't see any reason why it should be filling up the heap. Anyone else? But did you check the logs for the GCInspector? In my case, nodes are falling because of the heap, in your case, maybe it's something else. Do you see increased times when looking for GCInspector in the logs? []s 2014-06-20 14:51 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD. Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, In my case, the heap was filling up faster than it was draining. I am still looking for the cause of it, as I could drain really fast with SSD. However, in your case you could check (AFAIK) nodetool tpstats and see if there are too many pending write tasks, for instance. Maybe you really are writting more than the nodes are able to flush to disk. How many writes per second are you achieving? Also, I would look for GCInspector in the log: cat system.log* | grep GCInspector | wc -l tail -1000 system.log | grep GCInspector Do you see it running a lot? Is it taking much more time to run each time it runs? I am no Cassandra expert, but I would try these things first and post the results here. Maybe other people in the list have more ideas. Best regards, Marcelo. 2014-06-20 8:50 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: The cluster is new, so no updates were done. Version 2.0.8. It happened when I did many writes (no reads). Writes are done in small batches of 2 inserts (writing to 2 column families). The values are big blobs (up to 100Kb). Any clues? Pavel On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data. WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045. WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146. WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858. INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms for 1024 cells After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls. The only exceptions I see in the log is connected reset by the peer, which seems to be relative to gossip protocol, when a node goes down. Any hint of what could I do to investigate this problem further? Best
Using Cassandra as cache
Hi, In our project, many distributed modules sending each other binary blobs, up to 100-200kb each in average. Small JSONs are being sent over message queue, while Cassandra is being used as temporary storage for blobs. We are using Cassandra instead of in memory distributed cache like Couch due to following reasons: (1) We don't wan't to be limited by RAM size (2) We are using intensively ordered composite keys and ranges (it is not simple key/value cache). We don't use TTL mechanism for several reasons. Major reason is that we need to reclaim free disk space immediately and not after 10 days (gc_grace). We are very limited in disk space cause traffic is intensive and blobs are big. So what we did is creating every hour new keyspace named _MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. Keyspace is created by first module in flow on hourly basis and its name is being sent over message queue, to avoid possible problems. All modules read and write with consistency ONE and of cause there is no replication. Actually it works nice but we have several problems: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Hope not too much text :) Thanks, Pavel
Re: Using Cassandra as cache
Am 20.06.2014 um 23:48 schrieb Pavel Kogan pavel.ko...@cortica.com: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? Schema needs to settle down (nodes actually agree on a common view) - this may take several seconds until all nodes have that common view. Turn on DEBUG output in Java driver for example to see these messages. CL ONE requires the one node to be up and running - if that node's not running your request will definitely fail. Maybe you want to try CL ANY or increase RF to 2. 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Depending on the data, table layout, access patterns and C* version try with various key cache and maybe row cache configurations in both table options and cassandra.yaml signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Using Cassandra as cache
On Fri, Jun 20, 2014 at 2:48 PM, Pavel Kogan pavel.ko...@cortica.com wrote: So what we did is creating every hour new keyspace named _MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. I've recommended a similar technique in the past, but with alternating between Keyspace_A and Keyspace_B. That way you just TRUNCATE them instead of having to DROP. DROP/CREATE keyspace have problems that TRUNCATE do not. Perhaps use a TRUNCATE oriented technique? =Rob
Re: Using Cassandra as cache
Schema propagation takes times: https://issues.apache.org/jira/browse/CASSANDRA-5725 @Robert: do we still need to cleanup manually snapshot when truncating ? I remembered that on the 1.2 branch, even though the auto_snapshot param was set to false, truncating leads to snapshot creation that forced us to manually remove the snapshot folder on disk On Sat, Jun 21, 2014 at 12:01 AM, Robert Stupp sn...@snazy.de wrote: Am 20.06.2014 um 23:48 schrieb Pavel Kogan pavel.ko...@cortica.com: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? Schema needs to settle down (nodes actually agree on a common view) - this may take several seconds until all nodes have that common view. Turn on DEBUG output in Java driver for example to see these messages. CL ONE requires the one node to be up and running - if that node's not running your request will definitely fail. Maybe you want to try CL ANY or increase RF to 2. 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Depending on the data, table layout, access patterns and C* version try with various key cache and maybe row cache configurations in both table options and cassandra.yaml
Re: Using Cassandra as cache
Thanks Robert, Can you please explain what problems DROP/CREATE keyspace may cause? Seems like truncate working per column family and I have up to 10. What I should I delete from disk in that case? I can't delete whole folder right? I need to delete all content under each cf folder, but not folders? Correct? Pavel On Fri, Jun 20, 2014 at 6:01 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jun 20, 2014 at 2:48 PM, Pavel Kogan pavel.ko...@cortica.com wrote: So what we did is creating every hour new keyspace named _MM_dd_HH and when disk becomes full, script running in crontrab on each node drops keyspace with IF EXISTS flag, and deletes whole keyspace folder. That way whole process is very clean and no garbage is left on disk. I've recommended a similar technique in the past, but with alternating between Keyspace_A and Keyspace_B. That way you just TRUNCATE them instead of having to DROP. DROP/CREATE keyspace have problems that TRUNCATE do not. Perhaps use a TRUNCATE oriented technique? =Rob
Re: Using Cassandra as cache
Thanks, Is there any code way to know when the scheme finished to settle down? Can working RF=2 and CL=ANY result in any problem with consistency? I am not sure I can have problems with consistency if I don't do updates, only writes and reads. Can I? By the way I am using Cassandra 2.0.8. Pavel On Fri, Jun 20, 2014 at 6:01 PM, Robert Stupp sn...@snazy.de wrote: Am 20.06.2014 um 23:48 schrieb Pavel Kogan pavel.ko...@cortica.com: 1) When new keyspace with its columnfamilies is being just created (every round hour), sometimes other modules failed to read/write data, and we lose request. Can it be that creation of keyspace and columnfamilies is async operation or there is propagation time between nodes? Schema needs to settle down (nodes actually agree on a common view) - this may take several seconds until all nodes have that common view. Turn on DEBUG output in Java driver for example to see these messages. CL ONE requires the one node to be up and running - if that node's not running your request will definitely fail. Maybe you want to try CL ANY or increase RF to 2. 2) We are reading and writing intensively, and usually I don't need the data for more than 1-2 hours. What optimizations can I do to increase my small cluster read performance? Cluster configuration - 3 identical nodes: i7 3GHz, SSD 120Gb, 16Gb RAM, CentOS 6. Depending on the data, table layout, access patterns and C* version try with various key cache and maybe row cache configurations in both table options and cassandra.yaml
output interpretation of cassandra-stress
Hi all, I have a quick question on the unit of the latency in the output of cassandra-stress: is it milli-second or second? I cannot find the answer in the documentation: http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsCStressOutput_c.html Thanks, Senhua
Use Cassnadra thrift API with collection type
Hi, I have a problem when insert data of the map type into a cassandra table. I tried all kinds of MapSerializer to serialize the Map data and did not succeed. My code is like this: Column column = new Column(); column.name=columnSerializer.toByteBuffer(colname); // the column name of the map type, it works with other kinds of data type column.value = MapSerializer.getInstance(AsciiSerializer.instance, DecimalSerializer.instance).serialize(someMapData); column.timestamp = new Date().getTime(); Mutation mutation = new Mutation(); mutation.column_or_supercolumn = new ColumnOrSuperColumn(); mutation.column_or_supercolumn.column = column; mutationList.add(mutation); The data was input into the cassandra DB however it cannot be retrieved by CQL3 with the following error: ERROR 14:32:48,192 Exception in thread Thread[Thrift:4,5,main] java.lang.AssertionError at org.apache.cassandra.cql3.statements.ColumnGroupMap.getCollection(ColumnGroupMap.java:88) at org.apache.cassandra.cql3.statements.SelectStatement.getCollectionValue(SelectStatement.java:1185) at org.apache.cassandra.cql3.statements.SelectStatement.handleGroup(SelectStatement.java:1169) at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1076) ... So the question is how to write map data into cassandra by thrift API. Appreciated for any help. Thanks, Huiliang
How is null handled in terms of storage when using static schemas?
Let's say we have a table like with just an integer primary key named ID and a text column named VALUE… if we set value to 0, hello world … obviously , that's a normal value. However, what happens if we update it with 0, null … how is the 'null' stored? I couldn't find any documentation for this anywhere. The new null supersedes the older value of hello world so I assume it has to write it into an SSTable before both SSTables are compacted. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.