Re: Migrating from a windows cluster to a linux cluster.
Hi, We were trying to do a similar kind of migration (to a new cluster, no downtime) in order to remove a legacy OrderedPartitioner limitation. In the end we were allowed enough downtime to migrate, but originally we were proposing a similar solution based around deploying an update to the application to write to two clusters simultaneously, and a background copy of older data in some way. I'd love to hear how the migration went, and whether there were any (un)expected hurdles along the way! Thanks, Conan On 24 May 2012 23:56, Rob Coli rc...@palominodb.com wrote: On Thu, May 24, 2012 at 12:44 PM, Steve Neely sne...@rallydev.com wrote: It also seems like a dark deployment of your new cluster is a great method for testing the Linux-based systems before switching your mision critical traffic over. Monitor them for a while with real traffic and you can have confidence that they'll function correctly when you perform the switchover. FWIW, I would love to see graphs which show their compared performance under identical write load and then show the cut-over point for reads between the two clusters. My hypothesis is that your linux cluster will magically be much more perfomant/less loaded due to many linux-specific optimizations in Cassandra, but I'd dig seeing this illustrated in an apples to apples sense with real app traffic. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
About Composite range queries
How is it done in Cassandra to be able to range query on a composite key? key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C) like get_range (key1, start_column=(A,), end_column=(A, C)); will return [ (A:B:C), (A:C:C) ] (in pycassa) I mean does the composite implementation add much overhead to make it work? Does it need to add other Column families, to be able to range query between composites simple keys (first, second and third part of the composite)? What is the real advantage compared to super column families? key1 = A: (A,C), (B,C), (C,C), (D,C) , B: (A,C) thx
RE: All host pools Marked Down
Any takers on this. Hitting us badly right now. Regards, Shubham From: Shubham Srivastava Sent: Tuesday, May 29, 2012 12:55 PM To: user@cassandra.apache.org Subject: All host pools Marked Down I am getting this exception lot of times me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client. What this causes is no data read/write from the ring from my WebApp. I have retries as 3 and can see that max retries 3 getting exhausted with the same error as above. Checked cfstats and tpstats nothing seem to be a problem. However through the logs I see lot of time taken in compactions like the below INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java (line 608) Compacted to /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db. 36,986,932 to 36,961,554 (~99% of original) bytes for 132,743 keys. Time: 112,910ms. The time taken here seems pretty high. Will this cause a pause or read timeout etc. I have the connection from my web app through a hardware loadbalancer . Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC. CL:1 and RF:3. Memeory:8Gb heap - 14Gb Server memory with 8Core CPU. How do I move ahead in this. Shubham Srivastava | Technical Lead - Technology Development +91 124 4910 548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India [http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]What's new? My Trip Rewards - An exclusive loyalty program for MakeMyTrip customers.https://rewards.makemytrip.com/MTR [http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/ [http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php Office Map [http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1 Facebook [http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals Twitter inline: image001.gifinline: image002.gifinline: image003.gifinline: image004.gifinline: image005.gif
Re: All host pools Marked Down
Since all hosts are seem to be down, Hector will not do retry. There should be at least one node up in a cluster. Make sure that you have a proper connection from your webapps to your cluster. Cem. On Tue, May 29, 2012 at 1:46 PM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: Any takers on this. Hitting us badly right now. Regards, Shubham -- *From:* Shubham Srivastava *Sent:* Tuesday, May 29, 2012 12:55 PM *To:* user@cassandra.apache.org *Subject:* All host pools Marked Down I am getting this exception lot of times *me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.* * * What this causes is no data read/write from the ring from my WebApp. I have retries as 3 and can see that max retries 3 getting exhausted with the same error as above. Checked cfstats and tpstats nothing seem to be a problem. However through the logs I see lot of time taken in compactions like the below *INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java (line 608) Compacted to /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db. 36,986,932 to 36,961,554 (~99% of original) bytes for 132,743 keys. Time: 112,910ms. * The time taken here seems pretty high. Will this cause a pause or read timeout etc. I have the connection from my web app through a hardware loadbalancer . Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC. CL:1 and RF:3. Memeory:8Gb heap - 14Gb Server memory with 8Core CPU. How do I move ahead in this. *Shubham Srivastava* *|* Technical Lead - Technology Development +91 124 4910 548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India [image: http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]*What's new?* My Trip Rewards - An exclusive loyalty program for MakeMyTrip customers. https://rewards.makemytrip.com/MTR [image: http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/ [image: http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php *Office Map* [image: http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1 *Facebook* [image: http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals *Twitter* image002.gifimage001.gifimage005.gifimage003.gifimage004.gif
RE: All host pools Marked Down
My webapp connects to the LoadBalancer IP which has the actual nodes in its pool. If there is by any chance a connection break then will hector not retry to re-establish connection I guess it should retry every XX seconds based on retryDownedHostsDelayInSeconds . Regards, Shubham From: cem [cayiro...@gmail.com] Sent: Tuesday, May 29, 2012 6:13 PM To: user@cassandra.apache.org Subject: Re: All host pools Marked Down Since all hosts are seem to be down, Hector will not do retry. There should be at least one node up in a cluster. Make sure that you have a proper connection from your webapps to your cluster. Cem. On Tue, May 29, 2012 at 1:46 PM, Shubham Srivastava shubham.srivast...@makemytrip.commailto:shubham.srivast...@makemytrip.com wrote: Any takers on this. Hitting us badly right now. Regards, Shubham From: Shubham Srivastava Sent: Tuesday, May 29, 2012 12:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: All host pools Marked Down I am getting this exception lot of times me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client. What this causes is no data read/write from the ring from my WebApp. I have retries as 3 and can see that max retries 3 getting exhausted with the same error as above. Checked cfstats and tpstats nothing seem to be a problem. However through the logs I see lot of time taken in compactions like the below INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java (line 608) Compacted to /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db. 36,986,932 to 36,961,554 (~99% of original) bytes for 132,743 keys. Time: 112,910ms. The time taken here seems pretty high. Will this cause a pause or read timeout etc. I have the connection from my web app through a hardware loadbalancer . Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC. CL:1 and RF:3. Memeory:8Gb heap - 14Gb Server memory with 8Core CPU. How do I move ahead in this. Shubham Srivastava | Technical Lead - Technology Development +91 124 4910 548tel:%2B91%20124%204910%20548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India [http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]What's new? My Trip Rewards - An exclusive loyalty program for MakeMyTrip customers.https://rewards.makemytrip.com/MTR [http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/ [http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php Office Map [http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1 Facebook [http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals Twitter inline: image002.gifinline: image001.gifinline: image005.gifinline: image003.gifinline: image004.gif
Re: All host pools Marked Down
It should retry but it doesn't. It is also clear that it delegates the retry to the client *Retry burden pushed out to client* you can also check Hector code. I wrote a separate service that retries when this exception occurs. I think you have a problem with your load balancer. Try to connect with telnet. Cem. On Tue, May 29, 2012 at 3:06 PM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: My webapp connects to the LoadBalancer IP which has the actual nodes in its pool. If there is by any chance a connection break then will hector not retry to re-establish connection I guess it should retry every XX seconds based on retryDownedHostsDelayInSeconds . Regards, Shubham -- *From:* cem [cayiro...@gmail.com] *Sent:* Tuesday, May 29, 2012 6:13 PM *To:* user@cassandra.apache.org *Subject:* Re: All host pools Marked Down Since all hosts are seem to be down, Hector will not do retry. There should be at least one node up in a cluster. Make sure that you have a proper connection from your webapps to your cluster. Cem. On Tue, May 29, 2012 at 1:46 PM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: Any takers on this. Hitting us badly right now. Regards, Shubham -- *From:* Shubham Srivastava *Sent:* Tuesday, May 29, 2012 12:55 PM *To:* user@cassandra.apache.org *Subject:* All host pools Marked Down I am getting this exception lot of times *me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.* * * What this causes is no data read/write from the ring from my WebApp. I have retries as 3 and can see that max retries 3 getting exhausted with the same error as above. Checked cfstats and tpstats nothing seem to be a problem. However through the logs I see lot of time taken in compactions like the below *INFO [CompactionExecutor:73] 2012-05-29 11:03:01,605 CompactionManager.java (line 608) Compacted to /opt/cassandra-data/data/LH/UserPrefrences-tmp-g-8906-Data.db. 36,986,932 to 36,961,554 (~99% of original) bytes for 132,743 keys. Time: 112,910ms. * The time taken here seems pretty high. Will this cause a pause or read timeout etc. I have the connection from my web app through a hardware loadbalancer . Cassandra version is 0.8.6 with multi-DC ring on 6 nodes each in one DC. CL:1 and RF:3. Memeory:8Gb heap - 14Gb Server memory with 8Core CPU. How do I move ahead in this. *Shubham Srivastava* *|* Technical Lead - Technology Development +91 124 4910 548 | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India [image: http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]*What's new?* My Trip Rewards - An exclusive loyalty program for MakeMyTrip customers. https://rewards.makemytrip.com/MTR [image: http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/ [image: http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php *Office Map* [image: http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1 *Facebook* [image: http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals *Twitter* image004.gifimage005.gifimage002.gifimage001.gifimage003.gif
Re: Nodetool talking to an old IP address (and timing out)
I'm afraid that did not work. I'm running JMX on port 7199 (the default) and I verified that the port is open and accepting connections. Here's what I'm seeing: dmuth@devteam:~/cliq (production) $ nodetool --host localhost --port 7199 ring Error connection to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 10.244.207.16; nested exception is: java.net.ConnectException: Connection timed out at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) [snip] I'm guessing that the old IP address of our machine is cached somewhere inside of Cassandra or something related. Anyone have suggestions on where else I can check or debugging ideas? Thanks, -- Doug On Sun, May 27, 2012 at 1:45 PM, Cyril Auburtin cyril.aubur...@gmail.com wrote: specify the jmx port to nodetool, hard coded in conf/cassandra-env.sh nodetool -h localhost -p [jmx port] ring 2012/5/27 Douglas Muth doug.m...@gmail.com Hi folks, I'm a relative newbie to Cassandra, and have been trying to get up to speed on it so that I can start using it at $WORK. I ran into an interesting issue the other day with nodetool. I currently have Cassandra running on an Amazon EC2 instance running Ubuntu 10.10. At one point, I rebooted the system, and it looks like any attempt to use nodetool to talk to the localhost instead tries to connect to the old IP address of the machine! (EC2 instances get a new IP after shutdown/startup) When I try to run nodetool now, it times out after about 10 seconds with an error like this: dmuth@devteam:~ $ nodetool --host localhost ring Error connection to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 10.244.207.16; nested exception is: java.net.ConnectException: Connection timed out And I've verified that the IP of the machine does NOT in fact end in .16: dmuth@devteam:~ $ ifconfig eth0 eth0 Link encap:Ethernet HWaddr 12:31:3d:14:6a:84 inet addr:10.84.117.110 Bcast:10.84.117.255 Mask:255.255.255.0 I checked configuration file for Cassandra and verified that I do in fact have the new IP address in there. I also made sure that there was nothing weird in /etc/hosts. Also, cqlsh works just fine, as does the Helenus client for node.js. I can talk to our cassandra instance just fine through either of those two. I'm out of ideas at this point. Does anyone have any other suggestions for what I investigate on my system? Thanks, -- Doug http://twitter.com/dmuth
Re: Nodetool talking to an old IP address (and timing out)
8 hours, 1 cup of coffee, and 4 Advil later, and I think I got the bottom of this. Not having much of a Java or JMX background, I'll try to explain it the best that I can. To recap, my machine originally had the IP address of 10.244.207.16. Then I shutdown/restarted that EC2 instance, and it had the IP 10.84.117.110. During this, Cassandra was fine -- I could still connect to 127.0.0.1 with cqlsh and the Helenus node.js module. Things got weird only when I tried to use nodetool to manage the instance. As best I can tell, here's the algorithm that nodetool uses when connecting to a Cassandra instance: Step 1) Connect to the hostname and port specified on the command line. localhost and 7199 are the defaults. Step 2) Cassandra, at boot time, notes the hostname of the machine, and tells nodetool go connect to this hostname instead! After further investigation, it seems that after my instance was rebooted, the file /etc/hostname was not updated. It still had the value ip-10-244-207-16.ec2.internal in it. This means that any attempt to connect to Cassandra involved Cassandra telling nodetool, Hey, go talk to 10.244.207.16 instead. And that's where things went wrong. The permanent fix for this was to change the hostname to localhost and to restart Cassandra. The fact that Cassandra notes the hostname at startup was one thing that made this so difficult to track down. I did not see the old IP anywhere in Cassandra configuration (or in logfile output), so I did not think there was anything abnormal happening in the instance. While I'm sure there's a good reason for this sort of behavior, it is very confusing to a Cassandra newbie such as myself, and I'll bet others have been affected by this as well. In the future, I think some sort of logging of this sort of of logic, or perhaps a --verbose mode for nodetool would be a really good idea. What do other folks think? -- Doug http://twitter.com/dmuth On Tue, May 29, 2012 at 12:08 PM, Douglas Muth doug.m...@gmail.com wrote: I'm afraid that did not work. I'm running JMX on port 7199 (the default) and I verified that the port is open and accepting connections. [snip]
Re: Confusion regarding the terms replica and replication factor
Ok now i am confused :), ok if i have the following placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {DC1:R1,DC2:R1,DC3:R1 } this means in each of my datacenters i will have one full replica that also can be seed node? if i have 3 node in addition to the DC replica's with normal token calculations a key can be in any datacenter plus on each of the replicas right? It will show 12 nodes total in its ring On Thu, May 24, 2012 at 2:39 AM, aaron morton aa...@thelastpickle.com wrote: This is partly historical. NTS (as it is now) has not always existed and was not always the default. In days gone by used to be a fella could run a mighty fine key-value store using just a Simple Replication Strategy. A different way to visualise it is a single ring with a Z axis for the DC's. When you look at the ring from the top you can see all the nodes. When you look at it from the side you can see the nodes are on levels that correspond to their DC. Simple Strategy looks at the ring from the top. NTS works through the layers of the ring. If the hierarchy is Cluster - DataCenter - Node, why exactly do we need globally unique node tokens even though nodes are at the lowest level in the hierarchy. Nodes having a DC is a feature of *some* snitches and utilised by the *some* of the replication strategies (and by the messaging system for network efficiency). For background, mapping from row tokens to nodes is based on http://en.wikipedia.org/wiki/Consistent_hashing Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/05/2012, at 1:07 AM, java jalwa wrote: Thanks Aaron. That makes things clear. So I guess the 0 - 2^127 range for tokens corresponds to a cluster -level top-level ring. and then you add some logic on top of that with NTS to logically segment that range into sub-rings as per the notion of data clusters defined in NTS. Whats the advantage of having a single top-level ring ? intuitively it seems like each replication group could have a separate ring so that the same tokens can be assigned to nodes in different DC. If the hierarchy is Cluster - DataCenter - Node, why exactly do we need globally unique node tokens even though nodes are at the lowest level in the hierarchy. Thanks again. On Wed, May 23, 2012 at 3:14 AM, aaron morton aa...@thelastpickle.com wrote: Now if a row key hash is mapped to a range owned by a node in DC3, will the Node in DC3 still store the key as determined by the partitioner and then walk the ring and store 2 replicas each in DC1 and DC2 ? No, only nodes in the DC's specified in the NTS configuration will be replicas. Or will the co-ordinator node be aware of the replica placement strategy, and override the partitioner's decision and walk the ring until it first encounters a node in DC1 or DC2 ? and then place the remaining replicas ? The NTS considers each DC to have it's own ring. This can make token selection in a multi DC environment confusing at times. There is something in the DS docs about it. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/05/2012, at 3:16 PM, java jalwa wrote: Hi all, I am a bit confused regarding the terms replica and replication factor. Assume that I am using RandomPartitioner and NetworkTopologyStrategy for replica placement. From what I understand, with a RandomPartitioner, a row key will always be hashed and be stored on the node that owns the range to which the key is mapped. http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy. The example here, talks about having 2 data centers and a replication factor of 4 with 2 replicas in each datacenter, so the strategy is configured as DC1:2 and DC2:2. Now suppose I add another datacenter DC3, and do not change the NetworkTopologyStrategy. Now if a row key hash is mapped to a range owned by a node in DC3, will the Node in DC3 still store the key as determined by the partitioner and then walk the ring and store 2 replicas each in DC1 and DC2 ? Will that mean that I will then have 5 replicas in the cluster and not 4 ? Or will the co-ordinator node be aware of the replica placement strategy, and override the partitioner's decision and walk the ring until it first encounters a node in DC1 or DC2 ? and then place the remaining replicas ? Thanks.
nodetool move 0 gets stuck in moving state forever
If the node with token 0 dies and we just want it gone from the cluster we would do a nodetool move 0. Then we monitor using nodetool ring it seems to be stuck on Moving forever. Any ideas?
Re:nodetool move 0 gets stuck in moving state forever
remove removetoken -- Original -- From: Poziombka, Wade L; Date: 2012??5??30??(??) 5:29 To: user@cassandra.apache.org; Subject: nodetool move 0 gets stuck in moving state forever If the node with token 0 dies and we just want it gone from the cluster we would do a nodetool move 0. Then we monitor using nodetool ring it seems to be stuck on Moving forever. Any ideas?
Re: commitlog_sync_batch_window_in_ms change in 0.7
On Mon, May 28, 2012 at 6:53 AM, osishkin osishkin osish...@gmail.com wrote: I'm experimenting with Cassandra 0.7 for some time now. I feel obligated to recommend that you upgrade to Cassandra 1.1. Cassandra 0.7 is better than 0.6, but I definitely still wouldn't be experimenting with this old version in 2012. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: commitlog_sync_batch_window_in_ms change in 0.7
also nodetool disablegossip to stop other nodes sending requests to the one you are about to shut down. I can shut down my cluster, but I don't want to have the nodes ignore it due to some schema misoconfiguration etc when I get it up again. if you do a rolling restart the *cluster* will not lose any writes, but individual nodes will. This is by design. Hinted Handoff, Read Repair and the Consistency Level will take care of things. Also, +1 for using cassandra 1.1 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/05/2012, at 3:32 AM, Pierre Chalamet wrote: Hi, Using nodetool for each node one by one: 1. disablethrift 2. drain 3. Shutdown your daemon 4. Modify the config 5. Restart the node You won't lose the data on your nodes - clients might see a node down, it is usually not a problem if your c* client is smart enough. You also won't lose updates while performing your operations if the cl is quorum (and have enough replicas). --Original Message-- From: osishkin osishkin To: user@cassandra.apache.org ReplyTo: user@cassandra.apache.org ReplyTo: osish...@gmail.com Subject: commitlog_sync_batch_window_in_ms change in 0.7 Sent: May 28, 2012 15:53 I'm experimenting with Cassandra 0.7 for some time now. I want to increase the value of commitlog_sync_batch_window_in_ms, without losing previous data. I can shut down my cluster, but I don't want to have the nodes ignore it due to some schema misoconfiguration etc when I get it up again. I apologize if this was asked before, but I did not see a clear guide for achieving something like this. Can someone please help? Thank you - Pierre
Re: Nodetool talking to an old IP address (and timing out)
Did you open the inbound port 1024 ~ 65535 at Security Group? JMX uses two connection channels, one is 7199 by default for accepting connection request, another one is a random port between 1024 ~65535 decided during run time. Nodetool runs over JMX. Patrick. -Original Message- From: Douglas Muth Sent: Tuesday, May 29, 2012 11:39 AM To: user@cassandra.apache.org Subject: Re: Nodetool talking to an old IP address (and timing out) 8 hours, 1 cup of coffee, and 4 Advil later, and I think I got the bottom of this. Not having much of a Java or JMX background, I'll try to explain it the best that I can. To recap, my machine originally had the IP address of 10.244.207.16. Then I shutdown/restarted that EC2 instance, and it had the IP 10.84.117.110. During this, Cassandra was fine -- I could still connect to 127.0.0.1 with cqlsh and the Helenus node.js module. Things got weird only when I tried to use nodetool to manage the instance. As best I can tell, here's the algorithm that nodetool uses when connecting to a Cassandra instance: Step 1) Connect to the hostname and port specified on the command line. localhost and 7199 are the defaults. Step 2) Cassandra, at boot time, notes the hostname of the machine, and tells nodetool go connect to this hostname instead! After further investigation, it seems that after my instance was rebooted, the file /etc/hostname was not updated. It still had the value ip-10-244-207-16.ec2.internal in it. This means that any attempt to connect to Cassandra involved Cassandra telling nodetool, Hey, go talk to 10.244.207.16 instead. And that's where things went wrong. The permanent fix for this was to change the hostname to localhost and to restart Cassandra. The fact that Cassandra notes the hostname at startup was one thing that made this so difficult to track down. I did not see the old IP anywhere in Cassandra configuration (or in logfile output), so I did not think there was anything abnormal happening in the instance. While I'm sure there's a good reason for this sort of behavior, it is very confusing to a Cassandra newbie such as myself, and I'll bet others have been affected by this as well. In the future, I think some sort of logging of this sort of of logic, or perhaps a --verbose mode for nodetool would be a really good idea. What do other folks think? -- Doug http://twitter.com/dmuth On Tue, May 29, 2012 at 12:08 PM, Douglas Muth doug.m...@gmail.com wrote: I'm afraid that did not work. I'm running JMX on port 7199 (the default) and I verified that the port is open and accepting connections. [snip]
Re: commitlog_sync_batch_window_in_ms change in 0.7
You'd better use version 1.0.9 (using this one in production) or 1.0.10. 1.1 is still a bit young to be ready for prod unfortunately. --Original Message-- From: Rob Coli To: user@cassandra.apache.org To: osish...@gmail.com ReplyTo: user@cassandra.apache.org Subject: Re: commitlog_sync_batch_window_in_ms change in 0.7 Sent: May 30, 2012 03:12 On Mon, May 28, 2012 at 6:53 AM, osishkin osishkin osish...@gmail.com wrote: I'm experimenting with Cassandra 0.7 for some time now. I feel obligated to recommend that you upgrade to Cassandra 1.1. Cassandra 0.7 is better than 0.6, but I definitely still wouldn't be experimenting with this old version in 2012. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb - Pierre