Impact of running major compaction with Size Tiered Compaction - version 1.1.11
Hi We have a 6 node cluster in two DCs, Cassandra version 1.1.11, RF=3 in each DC. The DataStax Documentation says the following: Initiate a major compaction through nodetool compacthttp://www.datastax.co= m/docs/1.1/references/nodetool#nodetool-compacthttp://www.datastax.co=%20m/docs/1.1/references/nodetool#nodetool-compact. A major compaction merges all SSTables into one. Though major compaction can free disk space used by accumulated SSTables, during runtime it temporarily doubles disk space usage and is I/O and CPU intensive. After running a major compaction, automatic minor compactions are no longer triggered on a frequent basis. Consequently, you no longer have to manually run major compactions on a routine basis. Expect read performance to improve immediately following a major compaction, and then to continually degrade until you invoke the next major compaction. For this reason, DataStax does not recommend major compaction. A maintenance procedure has been run ( periodically ) on the nodes in the cluster which performs repair -pr, flush, compact, then cleanup. This runs fine for all CFs except one which is very large, with large rows. The entries all have TTLs specified which are less than gc_grace. Currently the SSTables are as follows for the CF, the maintenance just completed after running for 9+ hours: 19977911 Dec 27 06:38 -hf-57288-Data.db 5817 Dec 27 06:52 -hf-57304-Data.db 2735747237 Dec 27 06:52 -hf-57291-Data.db 718192 Dec 27 06:52 -hf-57305-Data.db 2581373226 Dec 29 16:48 -hf-57912-Data.db 936062446 Jan 9 22:22 -hf-58875-Data.db 235463043 Jan 10 05:23 -hf-5-Data.db 60851675 Jan 10 08:33 -hf-58893-Data.db 60871570 Jan 10 11:44 -hf-58898-Data.db 60537384 Jan 10 14:54 -hf-58903-Data.db Min_compaction_threshold is set to 4. Now for the questions: 1) Given that the DataStax recommendation was not followed - will minor compactions still be triggered if the major compactions are no longer performed? 2) Would the maintenance steps: repair -pr, flush, and cleanup still be useful? Thanks
Is there a client side method of determining the Cassandra version to which it is connected?
This question is specific to Thrift - but in the process of moving to CQL - so either client will be fine. Thanks
RE: data clean up problem
How do you determine the slow node, client side response latency? -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: Tuesday, May 28, 2013 1:10 PM To: user@cassandra.apache.org Subject: Re: data clean up problem How much disk used on each node? We run the suggested 300G per node as above that compactions can have trouble keeping up. Ps. We run compactions during peak hours just fine because our client reroutes to the 2 of 3 nodes not running compactions based on seeing the slow node so performance stays fast. The easy route is to of course double your cluster and halve the data sizes per node so compaction can keep up. Dean From: cem cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, May 28, 2013 1:45 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: data clean up problem Thanks for the answer. Sorry for the misunderstanding. I tried to say I don't send delete request from the client so it safe to set gc_grace to 0. TTL is used for data clean up. I am not running a manual compaction. I tried that ones but it took a lot of time finish and I will not have this amount of off-peek time in the production to run this. I even set the compaction throughput to unlimited and it didnt help that much. Disk size just keeps on growing but I know that there is enough space to store 1 day data. What do you think about time rage partitioning? Creating new column family for each partition and drop when you know that all records are expired. I have 5 nodes. Cem. On Tue, May 28, 2013 at 9:37 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Also, how many nodes are you running? From: cem cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, May 28, 2013 1:17 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: data clean up problem Thanks for the answer but it is already set to 0 since I don't do any delete. Cem On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote: You need to change the gc_grace time of the column family. It defaults to 10 days. By default the tombstones will not go away for 10 days. On Tue, May 28, 2013 at 2:46 PM, cem cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.commailto:cayiro...@gmail.com wrote: Hi Experts, We have general problem about cleaning up data from the disk. I need to free the disk space after retention period and the customer wants to dimension the disk space base on that. After running multiple performance tests with TTL of 1 day we saw that the compaction couldn't keep up with the request rate. Disks were getting full after 3 days. There were also a lot of sstables that are older than 1 day after 3 days. Things that we tried: -Change the compaction strategy to leveled. (helped a bit but not much) -Use big sstable size (10G) with leveled compaction to have more aggressive compaction.(helped a bit but not much) -Upgrade Cassandra from 1.0 to 1.2 to use TTL histograms (didn't help at all since it has key overlapping estimation algorithm that generates %100 match. Although we don't have...) Our column family structure is like this: Event_data_cf: (we store event data. Event_id is randomly generated and each event has attributes like location=london) row data event id data blob timeseries_cf: (key is the attribute that we want to index. It can be location=london, we didnt use secondary indexes because the indexes are dynamic.) row data index key time series of event id (event1_id, event2_id) timeseries_inv_cf: (this is used for removing event by event row key. ) row data event id set of index keys Candidate Solution: Implementing time range partitions. Each partition will have column family set and will be managed by client. Suppose that you want to have 7 days retention period. Then you can configure the partition size as 1 day and have 7 active partitions at any time. Then you can drop inactive partitions (older that 7 days). Dropping will immediate remove the data from the disk. (With proper Cassandra.yaml
RE: Question regarding multi datacenter and LOCAL_QUORUM
Yes - using NetworkTopologyStrategy From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, March 21, 2013 10:22 AM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 This is Read Repair, as controlled by the read_repaur_chance and dclocal_read_repair_chance CF settings, in action. Blockfor is how many nodes the read operation is going to wait for. When the number of nodes in the request is more than blockfor it means Read Repair is active, we are reading from all UP nodes and will repair any detected differences in the background. Your read is waiting for 2 nodes to respond only (including the one we ask for the data.) The odd thing here is that there are only 3 replicas nodes. Are you using the Network Topology Strategy ? If so I would expect there to be 6 nodes in the the request with RR, 3 in each DC. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 12:38 PM, Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com wrote: On Wed, Mar 20, 2013 at 3:18 PM, Tycen Stafford tstaff...@medio.commailto:tstaff...@medio.com wrote: I don't think that's correct for a mult-dc ring, but you'll want to hear a final answer from someone more authoritative. I could easily be wrong. Try using the built in token generating tool (token-generator) - I don't seem to have it on my hosts (1.1.6 also) so I can't confirm. I used the tokentoolv2.py tool (from here http://www.datastax.com/docs/1.0/initialize/token_generation) and got the following (which looks to me evenly spaced and not using offsets): tstafford@tycen-linux:Cassandra$ ./tokentoolv2.py 3 3 { 0: { 0: 0, 1: 56713727820156410577229101238628035242, 2: 113427455640312821154458202477256070485 }, 1: { 0: 28356863910078205288614550619314017621, 1: 85070591730234615865843651857942052863, 2: 141784319550391026443072753096570088106 } } For multi-DC clusters, the only requirement for a balanced cluster is that all tokens within a DC must be balanced; you can basically treat each DC as a separate ring (as long as your tokens don't line up exactly). So, either using an offset for the second DC or evenly spacing all nodes is acceptable. -- Tyler Hobbs DataStaxhttp://datastax.com/
RE: Question regarding multi datacenter and LOCAL_QUORUM
Aaron Here you go: create keyspace sipfs with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {AZ1 : 3, AZ2 : 3} and durable_writes = true; The CFs all have the following: create column family xxx with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Regards -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, March 21, 2013 12:47 PM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM Can you provide the full create keyspace statement ? Yes - using NetworkTopologyStrategy mmm, maybe it thinks the other nodes are down. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/03/2013, at 6:42 AM, Dwight Smith dwight.sm...@genesyslab.com wrote: Yes - using NetworkTopologyStrategy From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, March 21, 2013 10:22 AM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 This is Read Repair, as controlled by the read_repaur_chance and dclocal_read_repair_chance CF settings, in action. Blockfor is how many nodes the read operation is going to wait for. When the number of nodes in the request is more than blockfor it means Read Repair is active, we are reading from all UP nodes and will repair any detected differences in the background. Your read is waiting for 2 nodes to respond only (including the one we ask for the data.) The odd thing here is that there are only 3 replicas nodes. Are you using the Network Topology Strategy ? If so I would expect there to be 6 nodes in the the request with RR, 3 in each DC. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 12:38 PM, Tyler Hobbs ty...@datastax.com wrote: On Wed, Mar 20, 2013 at 3:18 PM, Tycen Stafford tstaff...@medio.com wrote: I don't think that's correct for a mult-dc ring, but you'll want to hear a final answer from someone more authoritative. I could easily be wrong. Try using the built in token generating tool (token-generator) - I don't seem to have it on my hosts (1.1.6 also) so I can't confirm. I used the tokentoolv2.py tool (from herehttp://www.datastax.com/docs/1.0/initialize/token_generation) and got the following (which looks to me evenly spaced and not using offsets): tstafford@tycen-linux:Cassandra$ ./tokentoolv2.py 3 3 { 0: { 0: 0, 1: 56713727820156410577229101238628035242, 2: 113427455640312821154458202477256070485 }, 1: { 0: 28356863910078205288614550619314017621, 1: 85070591730234615865843651857942052863, 2: 141784319550391026443072753096570088106 } } For multi-DC clusters, the only requirement for a balanced cluster is that all tokens within a DC must be balanced; you can basically treat each DC as a separate ring (as long as your tokens don't line up exactly). So, either using an offset for the second DC or evenly spacing all nodes is acceptable. -- Tyler Hobbs DataStax
Question regarding multi datacenter and LOCAL_QUORUM
Hi I have 2 data centers - with 3 nodes in each DC - version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected - locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much
RE: Question regarding multi datacenter and LOCAL_QUORUM
Slight correction - replication factor is 3. Will obtain nodetool ring info to verify tokens. From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 10:30 AM To: user@cassandra.apache.org Subject: Question regarding multi datacenter and LOCAL_QUORUM Hi I have 2 data centers - with 3 nodes in each DC - version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected - locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much
RE: Question regarding multi datacenter and LOCAL_QUORUM
From the yamls .143 initial_token: 0 .145 initial_token: 56713727820156410577229101238628035242 .146 initial_token: 113427455640312821154458202477256070485 From: Tycen Stafford [mailto:tstaff...@medio.com] Sent: Wednesday, March 20, 2013 10:43 AM To: user@cassandra.apache.org Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Did you alternate your tokens? I may be off base - but if not then that's why you might be seeing cross-dc request. -Tycen From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 10:30 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Question regarding multi datacenter and LOCAL_QUORUM Hi I have 2 data centers - with 3 nodes in each DC - version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected - locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much
RE: Question regarding multi datacenter and LOCAL_QUORUM
Actually the tokens in AZ2 are not correct. I'll get those corrected - thanks for the pointer. From: Tycen Stafford [mailto:tstaff...@medio.com] Sent: Wednesday, March 20, 2013 11:25 AM To: user@cassandra.apache.org Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Okay - that looks alternated to me. I'm assuming that 147, 148 and 149 are this then: 28356863910078205288614550619314017621 85070591730234615865843651857942052864 141784319550391026443072753096570088106 I'm out of ideas - sorry I couldn't help more. -Tycen From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 11:10 AM To: 'user@cassandra.apache.org' Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM From the yamls .143 initial_token: 0 .145 initial_token: 56713727820156410577229101238628035242 .146 initial_token: 113427455640312821154458202477256070485 From: Tycen Stafford [mailto:tstaff...@medio.com] Sent: Wednesday, March 20, 2013 10:43 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Did you alternate your tokens? I may be off base - but if not then that's why you might be seeing cross-dc request. -Tycen From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 10:30 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Question regarding multi datacenter and LOCAL_QUORUM Hi I have 2 data centers - with 3 nodes in each DC - version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected - locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much
RE: Question regarding multi datacenter and LOCAL_QUORUM
Hmm - the ring output follows, the tokens in AZ2 are offset by 100: Address DC RackStatus State Load Effective-Ownership Token 113427455640312821154458202477256070585 xx.yy.zz.143AZ1 RAC1Up Normal 626.21 KB 100.00% 0 xx.yy.zz.145AZ1 RAC1Up Normal 622.73 KB 100.00% 56713727820156410577229101238628035242 xx.yy.zz.146AZ1 RAC1Up Normal 622.49 KB 100.00% 113427455640312821154458202477256070485 xx.yy.zz.147AZ2 RAC2Up Normal 550.31 KB 100.00% 100 xx.yy.zz.148AZ2 RAC2Up Normal 622.05 KB 100.00% 56713727820156410577229101238628035342 xx.yy.zz.149AZ2 RAC2Up Normal 483.18 KB 100.00% 113427455640312821154458202477256070585 From: Dwight Smith Sent: Wednesday, March 20, 2013 11:29 AM To: 'user@cassandra.apache.org' Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Actually the tokens in AZ2 are not correct. I'll get those corrected - thanks for the pointer. From: Tycen Stafford [mailto:tstaff...@medio.com] Sent: Wednesday, March 20, 2013 11:25 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Okay - that looks alternated to me. I'm assuming that 147, 148 and 149 are this then: 28356863910078205288614550619314017621 85070591730234615865843651857942052864 141784319550391026443072753096570088106 I'm out of ideas - sorry I couldn't help more. -Tycen From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 11:10 AM To: 'user@cassandra.apache.org' Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM From the yamls .143 initial_token: 0 .145 initial_token: 56713727820156410577229101238628035242 .146 initial_token: 113427455640312821154458202477256070485 From: Tycen Stafford [mailto:tstaff...@medio.com] Sent: Wednesday, March 20, 2013 10:43 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Question regarding multi datacenter and LOCAL_QUORUM Did you alternate your tokens? I may be off base - but if not then that's why you might be seeing cross-dc request. -Tycen From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Wednesday, March 20, 2013 10:30 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Question regarding multi datacenter and LOCAL_QUORUM Hi I have 2 data centers - with 3 nodes in each DC - version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected - locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much
RE: Question regarding multi datacenter and LOCAL_QUORUM
Yes that is correct – the log is from 146. The client connects to nodes in AZ1. From: Derek Williams [mailto:de...@fyrie.net] Sent: Wednesday, March 20, 2013 11:50 AM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM I'm think I need help with pointing out what the problem is. The log you posted only contains references to 143, 145, and 146, which all appear to be in the same datacenter as 146? On Wed, Mar 20, 2013 at 11:29 AM, Dwight Smith dwight.sm...@genesyslab.commailto:dwight.sm...@genesyslab.com wrote: Hi I have 2 data centers – with 3 nodes in each DC – version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected – locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much -- Derek Williams
RE: Question regarding multi datacenter and LOCAL_QUORUM
Further information, in AZ1, when 143, 145, and 146 are up, all goes well. But when, say 143, fails, the client receives a TIMEOUT failure – even though 145 and 146 are up. From: Derek Williams [mailto:de...@fyrie.net] Sent: Wednesday, March 20, 2013 11:50 AM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM I'm think I need help with pointing out what the problem is. The log you posted only contains references to 143, 145, and 146, which all appear to be in the same datacenter as 146? On Wed, Mar 20, 2013 at 11:29 AM, Dwight Smith dwight.sm...@genesyslab.commailto:dwight.sm...@genesyslab.com wrote: Hi I have 2 data centers – with 3 nodes in each DC – version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected – locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much -- Derek Williams
RE: Question regarding multi datacenter and LOCAL_QUORUM
Yes – just when the node goes down. I’ll check 4705. Thanks From: Derek Williams [mailto:de...@fyrie.net] Sent: Wednesday, March 20, 2013 12:14 PM To: user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM Are those timeouts happening right when the node goes down? If so it might be https://issues.apache.org/jira/browse/CASSANDRA-4705 I don't think that issue applies if the node has been down long enough to be marked as down though. On Wed, Mar 20, 2013 at 12:53 PM, Dwight Smith dwight.sm...@genesyslab.commailto:dwight.sm...@genesyslab.com wrote: Further information, in AZ1, when 143, 145, and 146 are up, all goes well. But when, say 143, fails, the client receives a TIMEOUT failure – even though 145 and 146 are up. From: Derek Williams [mailto:de...@fyrie.netmailto:de...@fyrie.net] Sent: Wednesday, March 20, 2013 11:50 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Question regarding multi datacenter and LOCAL_QUORUM I'm think I need help with pointing out what the problem is. The log you posted only contains references to 143, 145, and 146, which all appear to be in the same datacenter as 146? On Wed, Mar 20, 2013 at 11:29 AM, Dwight Smith dwight.sm...@genesyslab.commailto:dwight.sm...@genesyslab.com wrote: Hi I have 2 data centers – with 3 nodes in each DC – version 1.1.6 - replication factor 2 - topology properties: # Cassandra Node IP=Data Center:Rack xx.yy.zz.143=AZ1:RAC1 xx.yy.zz.145=AZ1:RAC1 xx.yy.zz.146=AZ1:RAC1 xx.yy.zz.147=AZ2:RAC2 xx.yy.zz.148=AZ2:RAC2 xx.yy.zz.149=AZ2:RAC2 Using LOCAL_QUORUM, my understanding was that reads/writes would process locally ( for the coordinator ) and send requests to the remaining nodes in the DC, but in the system log for 146 I observe that this is not the case, extract from the log: DEBUG [Thrift:1] 2013-03-19 00:00:53,312 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,313 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,334 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,334 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 DEBUG [Thrift:1] 2013-03-19 00:00:53,366 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,367 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,391 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,418 CassandraServer.java (line 589) batch_mutate DEBUG [Thrift:1] 2013-03-19 00:00:53,429 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,429 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.145 DEBUG [Thrift:1] 2013-03-19 00:00:53,441 CassandraServer.java (line 306) get_slice DEBUG [Thrift:1] 2013-03-19 00:00:53,441 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /xx.yy.zz.146,/xx.yy.zz.143 The batch mutates are as expected – locally, two replicas, and hints to DC AZ2, but why the unexpected behavior for the get_slice requests. This is observed throughout the log. Thanks much -- Derek Williams -- Derek Williams
Specifying initial token in 1.2 fails
Hi Just started evaluating 1.2 - starting a clean Cassandra node - the usual practice is to specify the initial token - but when I attempt to start the node the following is observed: INFO [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 203) disk_failure_policy is stop DEBUG [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 205) page_cache_hinting is false INFO [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 266) Global memtable threshold is enabled at 339MB DEBUG [main] 2013-01-03 14:08:58,008 DatabaseDescriptor.java (line 381) setting auto_bootstrap to true ERROR [main] 2013-01-03 14:08:58,024 DatabaseDescriptor.java (line 495) Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052863 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:180) at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:433) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:121) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:178) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:397) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:440) This looks like a bug. Thanks
RE: Specifying initial token in 1.2 fails
Michael Yes indeed - my mistake. Thanks. I can specify RandomPartitioner, since I do not use indexing - yet. Just for informational purposes - with Murmur3 - to achieve a balanced cluster - is the initial token method supported? If so - how should these be generated, the token-generator seems to only apply to RandomPartitioner. Thanks again From: Michael Kjellman [mailto:mkjell...@barracuda.com] Sent: Friday, January 04, 2013 8:39 AM To: user@cassandra.apache.org Subject: Re: Specifying initial token in 1.2 fails Murmur3 != MD5 (RandomPartitioner) From: Dwight Smith dwight.sm...@genesyslab.commailto:dwight.sm...@genesyslab.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, January 4, 2013 8:36 AM To: 'user@cassandra.apache.orgmailto:'user@cassandra.apache.org' user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Specifying initial token in 1.2 fails Hi Just started evaluating 1.2 - starting a clean Cassandra node - the usual practice is to specify the initial token - but when I attempt to start the node the following is observed: INFO [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 203) disk_failure_policy is stop DEBUG [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 205) page_cache_hinting is false INFO [main] 2013-01-03 14:08:57,774 DatabaseDescriptor.java (line 266) Global memtable threshold is enabled at 339MB DEBUG [main] 2013-01-03 14:08:58,008 DatabaseDescriptor.java (line 381) setting auto_bootstrap to true ERROR [main] 2013-01-03 14:08:58,024 DatabaseDescriptor.java (line 495) Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052863 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:180) at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:433) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:121) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:178) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:397) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:440) This looks like a bug. Thanks -- Join Barracuda Networks in the fight against hunger. To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
Question regarding the need to run nodetool repair
I have a 4 node cluster, version 1.1.2, replication factor of 4, read/write consistency of 3, level compaction. Several questions. 1) Should nodetool repair be run regularly to assure it has completed before gc_grace? If it is not run, what are the exposures? 2) If a node goes down, and is brought back up prior to the 1 hour hinted handoff expiration, should repair be run immediately? 3) If the hinted handoff has expired, the plan is to remove the node and start a fresh node in its place. Does this approach cause problems? Thanks
RE: Question regarding the need to run nodetool repair
Thanks From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, November 15, 2012 4:30 PM To: user@cassandra.apache.org Subject: Re: Question regarding the need to run nodetool repair On Thursday, November 15, 2012, Dwight Smith dwight.sm...@genesyslab.com wrote: I have a 4 node cluster, version 1.1.2, replication factor of 4, read/write consistency of 3, level compaction. Several questions. 1) Should nodetool repair be run regularly to assure it has completed before gc_grace? If it is not run, what are the exposures? Yes. Lost tombstones could cause deleted data to re appear. 2) If a node goes down, and is brought back up prior to the 1 hour hinted handoff expiration, should repair be run immediately? If node is brought up prior to 1 hour. You should let the hints replay. Repair is always safe to run. 3) If the hinted handoff has expired, the plan is to remove the node and start a fresh node in its place. Does this approach cause problems? You only need to join a fresh mode if the node was down longer then gc grace. Default is 10 days. Thanks If you read and write at quorum and run repair regularly you can worry less about the things above because they are essentially non factors.
RE: Question regarding thrift login api and relation to access.properties and passwd.properties
Tyler Thanks much From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Tuesday, August 14, 2012 3:49 PM To: user@cassandra.apache.org Subject: Re: Question regarding thrift login api and relation to access.properties and passwd.properties access.properties and passwd.properties are only used by the example implementations, SimpleAuthenticator and SimpleAuthority. Your own implementation (which requires a custom class) certainly does not have to use these, it can use any other source to make the authn/authz decision. On Tue, Aug 14, 2012 at 4:07 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: The datastax documentation concisely describes how to configure and assure that the properties are used in client access. Question is this: if using the thrift api login, does C* use the Authentication class to determine access privileges based on the access/passwd properties? These questions are related to 1.1.3. Thanks -- Tyler Hobbs DataStax http://datastax.com/
Question regarding thrift login api and relation to access.properties and passwd.properties
The datastax documentation concisely describes how to configure and assure that the properties are used in client access. Question is this: if using the thrift api login, does C* use the Authentication class to determine access privileges based on the access/passwd properties? These questions are related to 1.1.3. Thanks
RE: Problem with cassandra startup on Linux
Aaron Yes will do - I had already made the suggested change - cluster is up and running. Thanks From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, August 13, 2012 1:56 AM To: user@cassandra.apache.org Subject: Re: Problem with cassandra startup on Linux Hi Dwight, I can confirm that issue on my MBP under Mountain Lion. Can you create a ticker at https://issues.apache.org/jira/browse/CASSANDRA and include the platform you are running on. For reference the change was added by https://issues.apache.org/jira/browse/CASSANDRA-4447 The change is only relevant if you are running on Java 7. As a work around change the relevant section of cassandra-env.sh to look like #startswith () [ ${1#$2} != $1 ] if [ `uname` = Linux ] ; then # reduce the per-thread stack size to minimize the impact of Thrift # thread-per-client. (Best practice is for client connections to # be pooled anyway.) Only do so on Linux where it is known to be # supported. #if startswith $JVM_VERSION '1.7.' #then #JVM_OPTS=$JVM_OPTS -Xss160k #else JVM_OPTS=$JVM_OPTS -Xss128k #fi fi Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/08/2012, at 5:16 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: Installed 1.1.3 on my Linux cluster - the JVM_OPTS were truncated due to a script error in Cassandra-env.sh: Invalid token in the following. startswith () [ ${1#$2} != $1 ]
RE: Problem with cassandra startup on Linux
Aaron https://issues.apache.org/jira/browse/CASSANDRA-4535 created. Thanks much From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Monday, August 13, 2012 7:54 AM To: user@cassandra.apache.org Subject: RE: Problem with cassandra startup on Linux Aaron Yes will do - I had already made the suggested change - cluster is up and running. Thanks From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, August 13, 2012 1:56 AM To: user@cassandra.apache.org Subject: Re: Problem with cassandra startup on Linux Hi Dwight, I can confirm that issue on my MBP under Mountain Lion. Can you create a ticker at https://issues.apache.org/jira/browse/CASSANDRA and include the platform you are running on. For reference the change was added by https://issues.apache.org/jira/browse/CASSANDRA-4447 The change is only relevant if you are running on Java 7. As a work around change the relevant section of cassandra-env.sh to look like #startswith () [ ${1#$2} != $1 ] if [ `uname` = Linux ] ; then # reduce the per-thread stack size to minimize the impact of Thrift # thread-per-client. (Best practice is for client connections to # be pooled anyway.) Only do so on Linux where it is known to be # supported. #if startswith $JVM_VERSION '1.7.' #then #JVM_OPTS=$JVM_OPTS -Xss160k #else JVM_OPTS=$JVM_OPTS -Xss128k #fi fi Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/08/2012, at 5:16 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: Installed 1.1.3 on my Linux cluster - the JVM_OPTS were truncated due to a script error in Cassandra-env.sh: Invalid token in the following. startswith () [ ${1#$2} != $1 ]
Problem with cassandra startup on Linux
Installed 1.1.3 on my Linux cluster - the JVM_OPTS were truncated due to a script error in Cassandra-env.sh: Invalid token in the following. startswith () [ ${1#$2} != $1 ]
Problem with version 1.1.3
Hi all Just replaced ( clean install ) version 1.0.9 with 1.1.3 - two node amazon cluster. After yaml modification and starting both nodes - they do not see each other: Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State Load OwnsToken 10.168.87.107 datacenter1 rack1 Up Normal 9.07 KB 100.00% 0 Address DC RackStatus State Load Effective-Ownership Token 10.171.77.39datacenter1 rack1 Up Normal 36.16 KB 100.00% 85070591730234615865843651857942052863 Help please
RE: Problem with version 1.1.3
Yes - BUT they are the node hostnames and not the ip addresses From: Derek Barnes [mailto:sj.clim...@gmail.com] Sent: Friday, August 10, 2012 2:00 PM To: user@cassandra.apache.org Subject: Re: Problem with version 1.1.3 Do both nodes refer to one another as seeds in cassandra.yaml? On Fri, Aug 10, 2012 at 1:46 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: Hi all Just replaced ( clean install ) version 1.0.9 with 1.1.3 - two node amazon cluster. After yaml modification and starting both nodes - they do not see each other: Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State Load OwnsToken 10.168.87.107 datacenter1 rack1 Up Normal 9.07 KB 100.00% 0 Address DC RackStatus State Load Effective-Ownership Token 10.171.77.39datacenter1 rack1 Up Normal 36.16 KB 100.00% 85070591730234615865843651857942052863 Help please
RE: Problem with version 1.1.3
Derek I added both node hostnames to the seeds and it now has the correct nodetool ring: Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052863 10.168.87.107 datacenter1 rack1 Up Normal 13.5 KB 50.00% 0 10.171.77.39datacenter1 rack1 Up Normal 13.5 KB 50.00% 85070591730234615865843651857942052863 Thanks for the hint. From: Derek Barnes [mailto:sj.clim...@gmail.com] Sent: Friday, August 10, 2012 2:00 PM To: user@cassandra.apache.org Subject: Re: Problem with version 1.1.3 Do both nodes refer to one another as seeds in cassandra.yaml? On Fri, Aug 10, 2012 at 1:46 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: Hi all Just replaced ( clean install ) version 1.0.9 with 1.1.3 - two node amazon cluster. After yaml modification and starting both nodes - they do not see each other: Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State Load OwnsToken 10.168.87.107 datacenter1 rack1 Up Normal 9.07 KB 100.00% 0 Address DC RackStatus State Load Effective-Ownership Token 10.171.77.39datacenter1 rack1 Up Normal 36.16 KB 100.00% 85070591730234615865843651857942052863 Help please
RE: Problem with version 1.1.3
Further info - it seems I had the seeds list backwards - it did not need both nodes - I have corrected that with each pointing to the other as a single seed entry - and it works fine. Thanks again for the quick response. From: Dwight Smith [mailto:dwight.sm...@genesyslab.com] Sent: Friday, August 10, 2012 2:15 PM To: user@cassandra.apache.org Subject: RE: Problem with version 1.1.3 Derek I added both node hostnames to the seeds and it now has the correct nodetool ring: Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052863 10.168.87.107 datacenter1 rack1 Up Normal 13.5 KB 50.00% 0 10.171.77.39datacenter1 rack1 Up Normal 13.5 KB 50.00% 85070591730234615865843651857942052863 Thanks for the hint. From: Derek Barnes [mailto:sj.clim...@gmail.com] Sent: Friday, August 10, 2012 2:00 PM To: user@cassandra.apache.org Subject: Re: Problem with version 1.1.3 Do both nodes refer to one another as seeds in cassandra.yaml? On Fri, Aug 10, 2012 at 1:46 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: Hi all Just replaced ( clean install ) version 1.0.9 with 1.1.3 - two node amazon cluster. After yaml modification and starting both nodes - they do not see each other: Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State Load OwnsToken 10.168.87.107 datacenter1 rack1 Up Normal 9.07 KB 100.00% 0 Address DC RackStatus State Load Effective-Ownership Token 10.171.77.39datacenter1 rack1 Up Normal 36.16 KB 100.00% 85070591730234615865843651857942052863 Help please
Frequent exception with Cassandra 1.0.9
I am running embedded Cassandra version 1.0.9 on Windows2008 Server frequently encounter the following exception: Stack: [0x7dc6,0x7dcb], sp=0x7dcaf0b0, free space=316k Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.io.WinNTFileSystem.getSpace0(Ljava/io/File;I)J+0 j java.io.WinNTFileSystem.getSpace(Ljava/io/File;I)J+10 j java.io.File.getUsableSpace()J+34 j org.apache.cassandra.config.DatabaseDescriptor.getDataFileLocationForTab le(Ljava/lang/String;JZ)Ljava/lang/String;+44 j org.apache.cassandra.db.Table.getDataFileLocation(JZ)Ljava/lang/String;+ 6 j org.apache.cassandra.db.Table.getDataFileLocation(J)Ljava/lang/String;+3 j org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(JLjava/lang/Strin g;)Ljava/lang/String;+5 j org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(JJLorg/apach e/cassandra/db/commitlog/ReplayPosition;)Lorg/apache/cassandra/io/sstabl e/SSTableWriter;+18 J org.apache.cassandra.db.Memtable.writeSortedContents(Lorg/apache/cassand ra/db/commitlog/ReplayPosition;)Lorg/apache/cassandra/io/sstable/SSTable Reader; j org.apache.cassandra.db.Memtable.access$400(Lorg/apache/cassandra/db/Mem table;Lorg/apache/cassandra/db/commitlog/ReplayPosition;)Lorg/apache/cas sandra/io/sstable/SSTableReader;+2 j org.apache.cassandra.db.Memtable$4.runMayThrow()V+36 j org.apache.cassandra.utils.WrappedRunnable.run()V+9 J java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnab le;)V J java.util.concurrent.ThreadPoolExecutor$Worker.run()V j java.lang.Thread.run()V+11 v ~StubRoutines::call_stub Java into: java version 1.6.0_30 Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)
The use of SuperColumns
There have been mentions that the use of SuperColumns is not really encouraged, recommendation to use composite columns instead. Will SuperColumns be removed. Any comments greatly appreciated. --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.
Question regarding support of batch_mutate + delete + slice predicate
Investigation of this combination led to the following: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/batch-m utate-deletion-slice-range-predicate-unsupported-td5048309.html Are there plans (6.x or 7) to support this? Thanks --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.
How can the cassandra token of a key be determined from the cassandra client side?
This question relates to a C++ client application, to direct the Cassandra request to the correct Cassandra node based on token form of the key. Would the following work? If the token form of the key could be determined - the keyspace description could be used to select the correct Cassandra node to which to send the request. First problem is how to determine the token from the key in C++. Thanks much --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.
RE: Question regarding tombstone removal on 0.6.4
Yes - very helpful - I seemed to have forgotten the grace seconds:) -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Tuesday, August 31, 2010 1:06 PM To: user@cassandra.apache.org Subject: Re: Question regarding tombstone removal on 0.6.4 does http://wiki.apache.org/cassandra/DistributedDeletes and http://wiki.apache.org/cassandra/MemtableSSTable help? On Tue, Aug 31, 2010 at 3:04 PM, Dwight Smith dwight.sm...@alcatel-lucent.com wrote: Hi I am running a three node cluster, everything works as expected. After running my application for ~60K iterations, I stopped the application, then performed Nodetool flush on my Keyspace, the a nodetool compact, repeating this for each node in the cluster. Then I exported with sstabletojson on one of my CFs and upon examining the json - all the entries had the tombstone flag true. This is correct, since all the application processes completed correctly, and the entries should be removed. Now for the question: I understood that a nodetool compact would cause the tombstone entries to be removed, is this true? Thanks CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.
RE: Newbie to cassandra
Is there a link available for the Cassandra Summit in SF?? From: Eben Hewitt [mailto:eben.hew...@gmail.com] Sent: Sunday, July 18, 2010 5:02 PM To: user@cassandra.apache.org Subject: Re: Newbie to cassandra Hi Sonia If you're still interested after the reading and want something more hands-on, Eric Evans is doing a workshop at OSCON next week if you had a sudden urge to go to Portland in the next 36 hours. Short of that, you can also sign up for just that workshop to view online. There's also a Cassandra Summit one day event in San Francisco August 10 if you're in that area. If you're interested in something in-depth, Riptano offers one day training classes around the US--one in New York August 6 and after that in Denver, I believe. Eben On Sun, Jul 18, 2010 at 4:48 PM, Jonathan Ellis jbel...@gmail.com wrote: Which is bullet #4 on the list I linked. :) On Sun, Jul 18, 2010 at 5:31 PM, Bill Hastings bllhasti...@gmail.com wrote: Or perhaps this one. This is the Cassandra paper from the guys at FB. http://www.cs.cornell.edu/projects/ladis2009/program.htm#session3 On Sun, Jul 18, 2010 at 1:21 PM, Jonathan Ellis jbel...@gmail.com wrote: Start with the recommended articles on http://wiki.apache.org/cassandra/ArticlesAndPresentations On Sun, Jul 18, 2010 at 1:46 PM, sonia gehlot sonia.geh...@gmail.com wrote: Hi everyone, I am new to Cassandra and wanted to try and start learning Cassandra. I have database background. I am fully exposed and have full command on Netezza, Oracle, MySQL, Sybase, SQL etc basically all the relational databases. As Cassandra is gaining popularity day by day by its amazing features, I also got tempt towards it and wanted to take deep dive into it. Please help me by guiding me in a right direction. How can I start working with Cassandra? Any help is appreciated. Thanks in advance. Sonia -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Cheers Bill -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- In science there are no 'depths'; there is surface everywhere. --Rudolph Carnap --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.
RE: Use of multiple Keyspaces
Thanks - I found on Wiki that the memtables and sstables are on a per CF basis. Sorry about the mail client formatting - I have no choice - corporate controlled:) Now I am concerned about the deletions - what areas should I investigate to understand the concerns you raise? Thanks again -Original Message- From: Benjamin Black [mailto:b...@b3k.us] Sent: Thursday, July 08, 2010 11:28 AM To: user@cassandra.apache.org Subject: Re: Use of multiple Keyspaces (and I'm sure someone will correct me if I am wrong on that) On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black b...@b3k.us wrote: There is a memtable per CF, regardless of how many keyspaces you have. --- CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential and proprietary information of Alcatel-Lucent and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or its affiliated entities.