Re: why do I have to use internal IP for EC2 nodes?

2012-09-05 Thread Robin Verlangen
@Yang: Sounds legit, as internal is not the same as external. Beware of the
fact that internal traffic is only free when it's in the same availability
zone. In the same region is charged with a small amount (~ $0.01).

With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/5 Yang tedd...@gmail.com

 thanks, but if the communication between cluster nodes all resolve to
 internal to internal, amazon will not charge the traffic as external
 traffic, right?

 On Tue, Sep 4, 2012 at 7:08 PM, aaron morton aa...@thelastpickle.comwrote:

 See http://aws.amazon.com/articles/1145?_encoding=UTF8jiveRedirect=1#12

 The external  dns will resolve to the internal IP when resolved
 internally.

 Using the internal IP means you are not charged for IO and it makes it
 clear you do not expect this service to be access from outside.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/09/2012, at 7:37 AM, Yang tedd...@gmail.com wrote:

  http://www.datastax.com/docs/1.1/initialize/cluster_init
 
 
  says:
 
   Note  In the - seeds list property, include the internal IP addresses
 of each seed node.
 
  why do I have to use internal IP?
  on a EC2 node,
  hostname resolution seems to directly give its internal IP:
 
  $ host aws1devbic1.biqa.ctgrd.com
  aws1devbic1.biqa.ctgrd.com is an alias for
 ec2-50-17-3-229.compute-1.amazonaws.com.
  ec2-50-17-3-229.compute-1.amazonaws.com has address 10.28.166.83
 
  so using the public DNS or internal IP seems to be the same thing, or
 something I'm missing ?
 
  Thanks
  Yang
 





Re: java.lang.NoClassDefFoundError when trying to do anything on one CF on one node

2012-09-05 Thread Thomas van Neerijnen
Thanks for the help Aaron.
I've checked NodeIdInfo and LocationInfo as below.
What am I looking at? I'm guessing the first row in NodeIdInfo represents
the ring with the node ids, but the second row perhaps dead nodes with old
schemas? That's a total guess, I'd be very interested to know what it and
the LocationInfo are.
If there's anything else you'd like me to check let me know, otherwise I'll
attempt your workaround later today.

[default@system] list NodeIdInfo ;
Using default limit of 100
---
RowKey: 4c6f63616c
= (column=b10552c0-ea0f-11e0--cb1f02ccbcff, value=0a1020d2,
timestamp=1317241393645)
= (column=e64fc8f0-595b-11e1--51be601cd0d7, value=0a1020d2,
timestamp=1329478703871)
= (column=732d4690-a596-11e1--51be601cd09f, value=0a1020d2,
timestamp=1337860139385)
= (column=bffd9d40-aa45-11e1--51be601cd0fe, value=0a1020d2,
timestamp=1338375234836)
= (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2,
timestamp=1344414498989)
= (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2,
timestamp=1345386691897)
---
RowKey: 43757272656e744c6f63616c
= (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2,
timestamp=1344414498989)
= (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2,
timestamp=1345386691897)

2 Rows Returned.
Elapsed time: 128 msec(s).
[default@system] list LocationInfo ;
Using default limit of 100
---
RowKey: 52696e67
= (column=00, value=0a1080d2, timestamp=134104900)
= (column=04a7128b6c83505dcd618720f92028f4, value=0a1020b7,
timestamp=1332360971660)
= (column=09249249249249249249249249249249, value=0a1080cd,
timestamp=1341136002862)
= (column=12492492492492492492492492492492, value=0a1020d3,
timestamp=1341135999465)
= (column=1500, value=0a1060d3,
timestamp=134104671)
= (column=1555, value=0a1020d3,
timestamp=1344530188382)
= (column=1b6db6db6db6db6db6db6db6db6db6db, value=0a1020b1,
timestamp=1341135997643)
= (column=1c71c71c71c71bff, value=0a1080d2,
timestamp=1317241889689)
= (column=24924924924924924924924924924924, value=0a1060d3,
timestamp=1341135996555)
= (column=29ff, value=0a1020d3,
timestamp=1317241534292)
= (column=2aaa, value=0a1060d3,
timestamp=1344530187539)
= (column=38e38e38e38e37ff, value=0a1060d3,
timestamp=1317241257569)
= (column=38e38e38e38e38e38e38e38e38e38e38, value=0a1060d3,
timestamp=1343136501647)
= (column=393170e0207a17d8519f0c1bfe325d51, value=0a1020d3,
timestamp=1345381375120)
= (column=3fff, value=0a1080d3,
timestamp=134104939)
= (column=471c71c71c71c71c71c71c71c71c71c6, value=0a1080d3,
timestamp=1343133153701)
= (column=471c71c71c71c7ff, value=0a1080d3,
timestamp=1317241786636)
= (column=49249249249249249249249249249249, value=0a1080d3,
timestamp=1341136002693)
= (column=52492492492492492492492492492492, value=0a106010,
timestamp=1341136002626)
= (column=53ff, value=0a1020d4,
timestamp=1328473688357)
= (column=5554, value=0a1060d4,
timestamp=134104910)
= (column=5b6db6db6db6db6db6db6db6db6db6da, value=0a1060d4,
timestamp=1332389784945)
= (column=5b6db6db6db6db6db6db6db6db6db6db, value=0a1060d4,
timestamp=1341136001027)
= (column=638e38e38e38e38e38e38e38e38e38e2, value=0a1060d4,
timestamp=1343125208462)
= (column=638e38e38e38e3ff, value=0a1060d4,
timestamp=1317241257577)
= (column=6c00, value=0a1020d3,
timestamp=134104789)
---
RowKey: 4c
= (column=436c75737465724e616d65,
value=4d6f6e737465724d696e642050726f6420436c7573746572,
timestamp=1317241251097000)
= (column=47656e65726174696f6e, value=50447e78, timestamp=134104152000)
= (column=50617274696f6e6572,
value=6f72672e6170616368652e63617373616e6472612e6468742e52616e646f6d506172746974696f6e6572,
timestamp=1317241251097000)
= (column=546f6b656e, value=2a00,
timestamp=134104214)
---
RowKey: 436f6f6b696573
=
(column=48696e7473207075726765642061732070617274206f6620757067726164696e672066726f6d20302e362e7820746f20302e37,
value=6f68207965732c20697420746865792077657265207075726765642e,
timestamp=1317241251249)
= (column=5072652d312e302068696e747320707572676564,
value=6f68207965732c2074686579207765726520707572676564,
timestamp=1326274339337)
---
RowKey: 426f6f747374726170
= (column=42, value=01, timestamp=134104213)

4 Rows Returned.
Elapsed time: 34 msec(s).

On Wed, Sep 5, 2012 at 2:42 AM, aaron morton aa...@thelastpickle.comwrote:

 Hmmm, this looks like an error in ctor for NodeId$LocalNodeIdHistory. Are
 there any other ERROR log messages?

 Do you see either of these two messages in the log:
 No saved local node id, using newly generated: {}
 or
 Saved local node id: {}


 Can you use cassandra-cli / cqlsh to print the contents of the 

Re: Practical node size limits

2012-09-05 Thread Віталій Тимчишин
You can try increasing streaming throttle.

2012/9/4 Dustin Wenz dustinw...@ebureau.com

 I'm following up on this issue, which I've been monitoring for the last
 several weeks. I thought people might find my observations interesting.

 Ever since increasing the heap size to 64GB, we've had no OOM conditions
 that resulted in a JVM termination. Our nodes have around 2.5TB of data
 each, and the replication factor is four. IO on the cluster seems to be
 fine, though I haven't been paying particular attention to any GC hangs.

 The bottleneck now seems to be the repair time. If any node becomes too
 inconsistent, or needs to be replaced, the rebuilt time is over a week.
 That issue alone makes this cluster configuration unsuitable for production
 use.

 - .Dustin

 On Jul 30, 2012, at 2:04 PM, Dustin Wenz dustinw...@ebureau.com wrote:

  Thanks for the pointer! It sounds likely that's what I'm seeing. CFStats
 reports that the bloom filter size is currently several gigabytes. Is there
 any way to estimate how much heap space a repair would require? Is it a
 function of simply adding up the filter file sizes, plus some fraction of
 neighboring nodes?
 
  I'm still curious about the largest heap sizes that people are running
 with on their deployments. I'm considering increasing ours to 64GB (with
 96GB physical memory) to see where that gets us. Would it be necessary to
 keep the young-gen size small to avoid long GC pauses? I also suspect that
 I may need to keep my memtable sizes small to avoid long flushes; maybe in
 the 1-2GB range.
 
- .Dustin
 
  On Jul 29, 2012, at 10:45 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  Yikes. You should read:
 
  http://wiki.apache.org/cassandra/LargeDataSetConsiderations
 
  Essentially what it sounds like your are now running into is this:
 
  The BloomFilters for each SSTable must exist in main memory. Repair
  tends to create some extra data which normally gets compacted away
  later.
 
  Your best bet is to temporarily raise the Xmx heap and adjust the
  index sampling size. If you need to save the data (if it is just test
  data you may want to give up and start fresh)
 
  Generally the issue with the large disk configurations it is hard to
  keep a good ram/disk ratio. Then most reads turn into disk seeks and
  the throughput is low. I get the vibe people believe large stripes are
  going to help Cassandra. The issue is that stripes generally only
  increase sequential throughput, but Cassandra is a random read system.
 
  How much ram/disk you need is case dependent but 1/5 ratio of RAM to
  disk is where I think most people want to be, unless their system is
  carrying SSD disks.
 
  Again you have to keep your bloom filters in java heap memory so and
  design that tries to create a quatrillion small rows is going to have
  memory issues as well.
 
  On Sun, Jul 29, 2012 at 10:40 PM, Dustin Wenz dustinw...@ebureau.com
 wrote:
  I'm trying to determine if there are any practical limits on the
 amount of data that a single node can handle efficiently, and if so,
 whether I've hit that limit or not.
 
  We've just set up a new 7-node cluster with Cassandra 1.1.2 running
 under OpenJDK6. Each node is 12-core Xeon with 24GB of RAM and is connected
 to a stripe of 10 3TB disk mirrors (a total of 20 spindles each) and
 connected via dual SATA-3 interconnects. I can read and write around
 900MB/s sequentially on the arrays. I started out with Cassandra tuned with
 all-default values, with the exception of the compaction throughput which
 was increased from 16MB/s to 100MB/s. These defaults will set the heap size
 to 6GB.
 
  Our schema is pretty simple; only 4 column families and each has one
 secondary index. The replication factor was set to four, and compression
 disabled. Our access patterns are intended to be about equal numbers of
 inserts and selects, with no updates, and the occasional delete.
 
  The first thing we did was begin to load data into the cluster. We
 could perform about 3000 inserts per second, which stayed mostly flat.
 Things started to go wrong around the time the nodes exceeded 800GB.
 Cassandra began to generate a lot of mutations messages dropped warnings,
 and was complaining that the heap was over 75% capacity.
 
  At that point, we stopped all activity on the cluster and attempted a
 repair. We did this so we could be sure that the data was fully consistent
 before continuing. Our mistake was probably trying to repair all of the
 nodes simultaneously - within an hour, Java terminated on one of the nodes
 with a heap out-of-memory message. I then increased all of the heap sizes
 to 8GB, and reduced the heap_newsize to 800MB. All of the nodes were
 restarted, and there was no no outside activity on the cluster. I then
 began a repair on a single node. Within a few hours, it OOMed again and
 exited. I then increased the heap to 12GB, and attempted the same thing.
 This time, the repair ran for about 7 hours before 

Re: java.lang.NoClassDefFoundError when trying to do anything on one CF on one node

2012-09-05 Thread Thomas van Neerijnen
forgot to answer your first question. I see this:
INFO 14:31:31,896 No saved local node id, using newly generated:
92109b80-ea0a-11e1--51be601cd0af


On Wed, Sep 5, 2012 at 8:41 AM, Thomas van Neerijnen
t...@bossastudios.comwrote:

 Thanks for the help Aaron.
 I've checked NodeIdInfo and LocationInfo as below.
 What am I looking at? I'm guessing the first row in NodeIdInfo represents
 the ring with the node ids, but the second row perhaps dead nodes with old
 schemas? That's a total guess, I'd be very interested to know what it and
 the LocationInfo are.
 If there's anything else you'd like me to check let me know, otherwise
 I'll attempt your workaround later today.

 [default@system] list NodeIdInfo ;
 Using default limit of 100
 ---
 RowKey: 4c6f63616c
 = (column=b10552c0-ea0f-11e0--cb1f02ccbcff, value=0a1020d2,
 timestamp=1317241393645)
 = (column=e64fc8f0-595b-11e1--51be601cd0d7, value=0a1020d2,
 timestamp=1329478703871)
 = (column=732d4690-a596-11e1--51be601cd09f, value=0a1020d2,
 timestamp=1337860139385)
 = (column=bffd9d40-aa45-11e1--51be601cd0fe, value=0a1020d2,
 timestamp=1338375234836)
 = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2,
 timestamp=1344414498989)
 = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2,
 timestamp=1345386691897)
 ---
 RowKey: 43757272656e744c6f63616c
 = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2,
 timestamp=1344414498989)
 = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2,
 timestamp=1345386691897)

 2 Rows Returned.
 Elapsed time: 128 msec(s).
 [default@system] list LocationInfo ;
 Using default limit of 100
 ---
 RowKey: 52696e67
 = (column=00, value=0a1080d2, timestamp=134104900)
 = (column=04a7128b6c83505dcd618720f92028f4, value=0a1020b7,
 timestamp=1332360971660)
 = (column=09249249249249249249249249249249, value=0a1080cd,
 timestamp=1341136002862)
 = (column=12492492492492492492492492492492, value=0a1020d3,
 timestamp=1341135999465)
 = (column=1500, value=0a1060d3,
 timestamp=134104671)
 = (column=1555, value=0a1020d3,
 timestamp=1344530188382)
 = (column=1b6db6db6db6db6db6db6db6db6db6db, value=0a1020b1,
 timestamp=1341135997643)
 = (column=1c71c71c71c71bff, value=0a1080d2,
 timestamp=1317241889689)
 = (column=24924924924924924924924924924924, value=0a1060d3,
 timestamp=1341135996555)
 = (column=29ff, value=0a1020d3,
 timestamp=1317241534292)
 = (column=2aaa, value=0a1060d3,
 timestamp=1344530187539)
 = (column=38e38e38e38e37ff, value=0a1060d3,
 timestamp=1317241257569)
 = (column=38e38e38e38e38e38e38e38e38e38e38, value=0a1060d3,
 timestamp=1343136501647)
 = (column=393170e0207a17d8519f0c1bfe325d51, value=0a1020d3,
 timestamp=1345381375120)
 = (column=3fff, value=0a1080d3,
 timestamp=134104939)
 = (column=471c71c71c71c71c71c71c71c71c71c6, value=0a1080d3,
 timestamp=1343133153701)
 = (column=471c71c71c71c7ff, value=0a1080d3,
 timestamp=1317241786636)
 = (column=49249249249249249249249249249249, value=0a1080d3,
 timestamp=1341136002693)
 = (column=52492492492492492492492492492492, value=0a106010,
 timestamp=1341136002626)
 = (column=53ff, value=0a1020d4,
 timestamp=1328473688357)
 = (column=5554, value=0a1060d4,
 timestamp=134104910)
 = (column=5b6db6db6db6db6db6db6db6db6db6da, value=0a1060d4,
 timestamp=1332389784945)
 = (column=5b6db6db6db6db6db6db6db6db6db6db, value=0a1060d4,
 timestamp=1341136001027)
 = (column=638e38e38e38e38e38e38e38e38e38e2, value=0a1060d4,
 timestamp=1343125208462)
 = (column=638e38e38e38e3ff, value=0a1060d4,
 timestamp=1317241257577)
 = (column=6c00, value=0a1020d3,
 timestamp=134104789)
 ---
 RowKey: 4c
 = (column=436c75737465724e616d65,
 value=4d6f6e737465724d696e642050726f6420436c7573746572,
 timestamp=1317241251097000)
 = (column=47656e65726174696f6e, value=50447e78,
 timestamp=134104152000)
 = (column=50617274696f6e6572,
 value=6f72672e6170616368652e63617373616e6472612e6468742e52616e646f6d506172746974696f6e6572,
 timestamp=1317241251097000)
 = (column=546f6b656e, value=2a00,
 timestamp=134104214)
 ---
 RowKey: 436f6f6b696573
 =
 (column=48696e7473207075726765642061732070617274206f6620757067726164696e672066726f6d20302e362e7820746f20302e37,
 value=6f68207965732c20697420746865792077657265207075726765642e,
 timestamp=1317241251249)
 = (column=5072652d312e302068696e747320707572676564,
 value=6f68207965732c2074686579207765726520707572676564,
 timestamp=1326274339337)
 ---
 RowKey: 426f6f747374726170
 = (column=42, value=01, timestamp=134104213)

 4 Rows Returned.
 Elapsed time: 34 msec(s).


 On Wed, Sep 5, 2012 at 2:42 AM, 

Cannot bootstrap new nodes in 1.0.11 ring - schema issue

2012-09-05 Thread Jason Harvey
Hey folks,

I have a 1.0.11 ring running in production with 6 nodes. Trying to 
bootstrap a new node in, and I'm getting the following consistently:

 INFO [main] 2012-09-05 04:24:13,317 StorageService.java (line 668) 
JOINING: waiting for schema information to complete


After waiting for over 30 minutes, I restarted the node to try again, and 
got the same thing. Tried wiping out the data dir on the new node, as well. 
Same result.

Turned on DEBUG, and got the following:

 INFO [main] 2012-09-05 03:58:55,205 StorageService.java (line 668) 
JOINING: waiting for schema information to complete
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,440 
DefinitionsUpdateVerbHandler.java (line 70) Applying UpdateColumnFamily 
from /10.140.128.218
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,440 
DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous 
version mismatch. cannot apply.
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,631 
DefinitionsUpdateVerbHandler.java (line 70) Applying UpdateColumnFamily 
from /10.140.128.218
DEBUG [MigrationStage:1] 2012-09-05 03:59:11,631 
DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous 
version mismatch. cannot apply.


The logs continue with a bunch of failed migration errors from each node in 
the ring.

So, I'm guessing that there is a schema history problem on one of my nodes? 
Any clues on how I can fix this? I had considered wiping out the schema on 
one of my running nodes and starting it back up, but I'm worried it might 
not come back if it gets the same errors.


Also as a random question: is there any way to 'merge' historical schema 
changes together?


Thanks,
Jason


Re: configure KeyCahce to use Non-Heap memory ?

2012-09-05 Thread Ananth Gundabattula
Hello Aaron,

Thanks a lot for the response. Raised a request 
https://issues.apache.org/jira/browse/CASSANDRA-4619

Here is the nodetool dump: (from one of the two nodes in the cluster)

Token: 0
Gossip active: true
Thrift active: true
Load : 147.64 GB
Generation No: 1346635362
Uptime (seconds) : 182707
Heap Memory (MB) : 4884.33 / 8032.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0
Key Cache: size 777651120 (bytes), capacity 777651120 (bytes), 44354999 
hits, 98275175 requests, 0.451 recent hit rate, 14400 save period in seconds
Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN 
recent hit rate, 0 save period in seconds


Number of rows in the 2 node cluster is 74+ Million



Regards,
Ananth




From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, September 5, 2012 11:33 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: configure KeyCahce to use Non-Heap memory ?

Is there any way I can configure KeyCahce to use Non-Heap memory ?
No.
You could add a feature request here 
https://issues.apache.org/jira/browse/CASSANDRA

Could you post some stats on the current key cache size and hit rate ? (from 
nodetool info)
It would be interesting to know how many keys it contains Vs the number of rows 
on the box and the hit rate.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/09/2012, at 3:01 PM, Ananth Gundabattula 
agundabatt...@threatmetrix.commailto:agundabatt...@threatmetrix.com wrote:


Is there any way I can configure KeyCahce to use Non-Heap memory ?

We have large memory nodes :  ~96GB memory per node and effectively using only  
8 GB configured for heap ( to avoid GC issues because of a large heap)

We have a constraint with respect to :

 1.  Row cache models don't reflect our data query patterns and hence can only 
optimize on the key cache
 2.  Time constrained to change our schema to be more NO-SQL specific


Regards,
Ananth



Schema Disagreement after migration from 1.0.6 to 1.1.4

2012-09-05 Thread Martin Koch
Hi list

We have a 5-node Cassandra cluster with a single 1.0.9 installation and
four 1.0.6 installations.

We have tried installing 1.1.4 on one of the 1.0.6 nodes (following the
instructions on http://www.datastax.com/docs/1.1/install/upgrading).

After bringing up 1.1.4 there are no errors in the log, but the cluster now
suffers from schema disagreement

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
59adb24e-f3cd-3e02-97f0-5b395827453f: [10.10.29.67] - The new 1.1.4 node

943fc0a0-f678-11e1--339cf8a6c1bf: [10.10.87.228, 10.10.153.45,
10.10.145.90, 10.38.127.80] - nodes in the old cluster

The recipe for recovering from schema disagreement (
http://wiki.apache.org/cassandra/FAQ#schema_disagreement) doesn't cover the
new directory layout. The system/Schema directory is empty save for a
snapshots subdirectory. system/schema_columnfamilies and
system/schema_keyspaces contain some files. As described in datastax's
description, we tried running nodetool upgradesstables. When this had done,
describe schema in the cli showed a schema definition which seemed correct,
but was indeed different from the schema on the other nodes in the cluster.

Any clues on how we should proceed?

Thanks,
/Martin Koch


Re: Schema Disagreement after migration from 1.0.6 to 1.1.4

2012-09-05 Thread Edward Sargisson

I would try nodetool resetlocalschema.


On 12-09-05 07:08 AM, Martin Koch wrote:

Hi list

We have a 5-node Cassandra cluster with a single 1.0.9 installation 
and four 1.0.6 installations.


We have tried installing 1.1.4 on one of the 1.0.6 nodes (following 
the instructions on http://www.datastax.com/docs/1.1/install/upgrading).


After bringing up 1.1.4 there are no errors in the log, but the 
cluster now suffers from schema disagreement


[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
59adb24e-f3cd-3e02-97f0-5b395827453f: [10.10.29.67] -The new 1.1.4 node

943fc0a0-f678-11e1--339cf8a6c1bf: [10.10.87.228, 10.10.153.45, 
10.10.145.90, 10.38.127.80] - nodes in the old cluster


The recipe for recovering from schema disagreement 
(http://wiki.apache.org/cassandra/FAQ#schema_disagreement) doesn't 
cover the new directory layout. The system/Schema directory is empty 
save for a snapshots subdirectory. system/schema_columnfamilies and 
system/schema_keyspaces contain some files. As described in datastax's 
description, we tried running nodetool upgradesstables. When this had 
done, describe schema in the cli showed a schema definition which 
seemed correct, but was indeed different from the schema on the other 
nodes in the cluster.


Any clues on how we should proceed?

Thanks,
/Martin Koch


--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net mailto:edward.sargis...@globalrelay.net


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
http://www.globalrelay.com/services/message*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




SurgeCon 2012

2012-09-05 Thread Chris Burroughs
Surge [1] is scalability focused conference in late September hosted in
Baltimore.  It's a pretty cool conference with a good mix of
operationally minded people interested in scalability, distributed
systems, systems level performance and good stuff like that.  You should
go! [2]

For those of you who like historical trivia Mike Malone gave a well
recieved Cassandra talk at the first SurgeCon in 2010 [3].

This year there is organised room for BoF's and such with several
one-hour slots Wednesday and Thursday evenings, between 9 p.m. and
midnight for BoFs.  Last year a few of us got together informally around
lunch time [4].

Interested in getting together again this year?  Think we have critical
mass for a BoF?

[1] http://omniti.com/surge/2012

[2] http://omniti.com/surge/2012/register

[3] http://omniti.com/surge/2010/speakers/mike-malone

[4]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3c4e82140a.5070...@gmail.com%3E


Re: unsubscribe

2012-09-05 Thread Rob Coli
http://wiki.apache.org/cassandra/FAQ#unsubscribe

On Wed, Aug 29, 2012 at 3:57 PM, Juan Antonio Gomez Moriano 
mori...@exciteholidays.com wrote:


 --
   *Juan Antonio Gomez Moriano*
 DEVELOPER TEAM LEADER  [image: Excite Holidays]

 T +61 2 8061 2917

 emori...@exciteholidays.com

 Wwww.exciteholidays.com
 A Suite 1901, 101 Grafton St, Bondi Junction, NSW 2022, Australia




-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Schema Disagreement after migration from 1.0.6 to 1.1.4

2012-09-05 Thread Omid Aladini
Do you see exceptions like java.lang.UnsupportedOperationException:
Not a time-based UUID in log files of nodes running 1.0.6 and 1.0.9?
Then it's probably due to [1] explained here [2] -- In this case you
either have to upgrade all nodes to 1.1.4 or if you prefer keeping a
mixed-version cluster, the 1.0.6 and 1.0.9 nodes won't be able to join
the cluster again, unless you temporarily upgrade them to 1.0.11.

Cheers,
Omid

[1] https://issues.apache.org/jira/browse/CASSANDRA-1391
[2] https://issues.apache.org/jira/browse/CASSANDRA-4195

On Wed, Sep 5, 2012 at 4:08 PM, Martin Koch m...@issuu.com wrote:

 Hi list

 We have a 5-node Cassandra cluster with a single 1.0.9 installation and four 
 1.0.6 installations.

 We have tried installing 1.1.4 on one of the 1.0.6 nodes (following the 
 instructions on http://www.datastax.com/docs/1.1/install/upgrading).

 After bringing up 1.1.4 there are no errors in the log, but the cluster now 
 suffers from schema disagreement

 [default@unknown] describe cluster;
 Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
 59adb24e-f3cd-3e02-97f0-5b395827453f: [10.10.29.67] - The new 1.1.4 node

 943fc0a0-f678-11e1--339cf8a6c1bf: [10.10.87.228, 10.10.153.45, 
 10.10.145.90, 10.38.127.80] - nodes in the old cluster

 The recipe for recovering from schema disagreement 
 (http://wiki.apache.org/cassandra/FAQ#schema_disagreement) doesn't cover the 
 new directory layout. The system/Schema directory is empty save for a 
 snapshots subdirectory. system/schema_columnfamilies and 
 system/schema_keyspaces contain some files. As described in datastax's 
 description, we tried running nodetool upgradesstables. When this had done, 
 describe schema in the cli showed a schema definition which seemed correct, 
 but was indeed different from the schema on the other nodes in the cluster.

 Any clues on how we should proceed?

 Thanks,
 /Martin Koch


Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Hi,

We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM for
writes and reads.  The network we have seen between the DC is sometimes
flaky lasting few minutes to few 10 of minutes.

I wanted to know what is the best way to measure/monitor either the lag or
replication latency between the data centers.  Are there any metrics I can
monitor to find the backlog of data that needs to be transferred?

Thanks in advance.

VR


Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Mohit Anchlia
As far as I know Cassandra doesn't use internal queueing mechanism specific
to replication. Cassandra sends the write the remote DC and after that it's
upto the tcp/ip stack to deal with buffering. If requests starts to timeout
Cassandra would use HH upto certain time. For longer outage you would have
to run repair.

Also look at tcp/ip tuning parameters that are helpful with your scenario:

http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

Run iperf and test the latency.

On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag or
 replication latency between the data centers.  Are there any metrics I can
 monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR



Re: Practical node size limits

2012-09-05 Thread Rob Coli
On Sun, Jul 29, 2012 at 7:40 PM, Dustin Wenz dustinw...@ebureau.com wrote:
 We've just set up a new 7-node cluster with Cassandra 1.1.2 running under 
 OpenJDK6.

It's worth noting that Cassandra project recommends Sun JRE. Without
the Sun JRE, you might not be able to use JAMM to determine the live
ratio. Very few people use OpenJDK in production, so using it also
increases the likelihood that you might be the first to encounter a
given issue. FWIW!

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Thanks for the quick reply, Mohit.Can we measure/monitor the size of
Hinted Handoffs?  Would it be a good enough indicator of my back log?

Although we know when a network is flaky, we are interested in knowing how
much data is piling up in local DC that needs to be transferred.

Greatly appreciate your help.

VR


On Wed, Sep 5, 2012 at 8:33 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 As far as I know Cassandra doesn't use internal queueing mechanism
 specific to replication. Cassandra sends the write the remote DC and after
 that it's upto the tcp/ip stack to deal with buffering. If requests starts
 to timeout Cassandra would use HH upto certain time. For longer outage you
 would have to run repair.

 Also look at tcp/ip tuning parameters that are helpful with your scenario:

 http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

 Run iperf and test the latency.

 On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag
 or replication latency between the data centers.  Are there any metrics I
 can monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR





Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Mohit Anchlia
Cassandra exposes lot of metrics through Jconsole. You might be able to get
some information from Jconsole.

On Wed, Sep 5, 2012 at 8:47 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Thanks for the quick reply, Mohit.Can we measure/monitor the size of
 Hinted Handoffs?  Would it be a good enough indicator of my back log?

 Although we know when a network is flaky, we are interested in knowing how
 much data is piling up in local DC that needs to be transferred.

 Greatly appreciate your help.

 VR


 On Wed, Sep 5, 2012 at 8:33 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 As far as I know Cassandra doesn't use internal queueing mechanism
 specific to replication. Cassandra sends the write the remote DC and after
 that it's upto the tcp/ip stack to deal with buffering. If requests starts
 to timeout Cassandra would use HH upto certain time. For longer outage you
 would have to run repair.

 Also look at tcp/ip tuning parameters that are helpful with your scenario:

 http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

 Run iperf and test the latency.

  On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag
 or replication latency between the data centers.  Are there any metrics I
 can monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR






Re: Schema Disagreement after migration from 1.0.6 to 1.1.4

2012-09-05 Thread Martin Koch
Thanks, this is exactly it. We'd like to do a rolling upgrade - this is a
production cluster - so I guess we'll upgrade 1.0.6 - 1.0.11 - 1.1.4,
then.

/Martin

On Thu, Sep 6, 2012 at 2:35 AM, Omid Aladini omidalad...@gmail.com wrote:

 Do you see exceptions like java.lang.UnsupportedOperationException:
 Not a time-based UUID in log files of nodes running 1.0.6 and 1.0.9?
 Then it's probably due to [1] explained here [2] -- In this case you
 either have to upgrade all nodes to 1.1.4 or if you prefer keeping a
 mixed-version cluster, the 1.0.6 and 1.0.9 nodes won't be able to join
 the cluster again, unless you temporarily upgrade them to 1.0.11.

 Cheers,
 Omid

 [1] https://issues.apache.org/jira/browse/CASSANDRA-1391
 [2] https://issues.apache.org/jira/browse/CASSANDRA-4195

 On Wed, Sep 5, 2012 at 4:08 PM, Martin Koch m...@issuu.com wrote:
 
  Hi list
 
  We have a 5-node Cassandra cluster with a single 1.0.9 installation and
 four 1.0.6 installations.
 
  We have tried installing 1.1.4 on one of the 1.0.6 nodes (following the
 instructions on http://www.datastax.com/docs/1.1/install/upgrading).
 
  After bringing up 1.1.4 there are no errors in the log, but the cluster
 now suffers from schema disagreement
 
  [default@unknown] describe cluster;
  Cluster Information:
 Snitch: org.apache.cassandra.locator.SimpleSnitch
 Partitioner: org.apache.cassandra.dht.RandomPartitioner
 Schema versions:
  59adb24e-f3cd-3e02-97f0-5b395827453f: [10.10.29.67] - The new 1.1.4 node
 
  943fc0a0-f678-11e1--339cf8a6c1bf: [10.10.87.228, 10.10.153.45,
 10.10.145.90, 10.38.127.80] - nodes in the old cluster
 
  The recipe for recovering from schema disagreement (
 http://wiki.apache.org/cassandra/FAQ#schema_disagreement) doesn't cover
 the new directory layout. The system/Schema directory is empty save for a
 snapshots subdirectory. system/schema_columnfamilies and
 system/schema_keyspaces contain some files. As described in datastax's
 description, we tried running nodetool upgradesstables. When this had done,
 describe schema in the cli showed a schema definition which seemed correct,
 but was indeed different from the schema on the other nodes in the cluster.
 
  Any clues on how we should proceed?
 
  Thanks,
  /Martin Koch



Secondary index read/write explanation

2012-09-05 Thread Venkat Rama
Hi All,

I am a new bee to Cassandra and trying to understand how secondary indexes
work.  I have been going over the discussion on
https://issues.apache.org/jira/browse/CASSANDRA-749 about local secondary
indexes. And interesting question on
http://www.mail-archive.com/user@cassandra.apache.org/msg16966.html.  The
discussion seems to assume that most common uses cases are ones with range
queries.  Is this right?

I am trying to understand the low cardinality reasoning and how the read
gets executed.  I have following questions, hoping i can explain my
question well :)

1.  When a write request is received, it is written to the base CF and
secondary index to secondary (hidden) CF. If this right, will the secondary
index be written local the node or will it follow RP/OPP to write to nodes.
2.  When a coordinator receives a read request with say predicate x=y where
column x is the secondary index, how does the coordinator query relevant
node(s)? How does it avoid sending it to all nodes if it is locally indexed?

If there is any article/blog that can help understand this better, please
let me know.

Thanks again in advance.

VR


Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Is there a specific metric you can recommend?

VR

On Wed, Sep 5, 2012 at 9:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Cassandra exposes lot of metrics through Jconsole. You might be able to
 get some information from Jconsole.


 On Wed, Sep 5, 2012 at 8:47 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Thanks for the quick reply, Mohit.Can we measure/monitor the size of
 Hinted Handoffs?  Would it be a good enough indicator of my back log?

 Although we know when a network is flaky, we are interested in knowing
 how much data is piling up in local DC that needs to be transferred.

 Greatly appreciate your help.

 VR


 On Wed, Sep 5, 2012 at 8:33 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 As far as I know Cassandra doesn't use internal queueing mechanism
 specific to replication. Cassandra sends the write the remote DC and after
 that it's upto the tcp/ip stack to deal with buffering. If requests starts
 to timeout Cassandra would use HH upto certain time. For longer outage you
 would have to run repair.

 Also look at tcp/ip tuning parameters that are helpful with your
 scenario:

 http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

 Run iperf and test the latency.

  On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama 
 venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag
 or replication latency between the data centers.  Are there any metrics I
 can monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR