Re: sstableloader Could not retrieve endpoint ranges
I want to follow up on this thread to describe what I was able to get working. My goal was to switch a cluster to vnodes, in the process preserving the data for a single table, endpoints.endpoint_messages. Otherwise, I could afford to start from a clean slate. As should be apparent, I could also afford to do this within a maintenance window where the cluster was down. In other words, I had the luxury of not having to add a new data center to a live cluster per DataStax's documented procedure to enable vnodes: http://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html What I got working relies on the nodetool snapshot command to create various SSTable snapshots under endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME. The snapshots represent the data being backed up and restored from. The backup and restore is not directly, literally working against the original SSTables directly in various endpoints/endpoint_messages/ directories. - endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME/: These SSTables are being copied off and restored from. - endpoints/endpoint_messages/: These SSTables are obviously the source of the snapshots but are not being copied off and restored from. Instead of using sstableloader to load the snapshots into the re-initialized Cassandra cluster, I used the JMX StorageService.bulkLoad command after establishing a JConsole session to each node. I copied off the snapshots to load to a directory path that ends with endpoints/endpoint_messages/ to give the bulk-loader a path it expects. The directory path that is the destination for nodetool snapshot and the source for StorageService.bulkLoad is on the same host as the Cassandra node but outside the purview of the Cassandra node. This procedure can be summarized as follows: 1. For each node, create a snapshot of the endpoint_messages table as a backup. 2. Stop the cluster. 3. On each node, wipe all the data, i.e. the contents of data_files_directories, commitlog, and saved_caches. 4. Deploy the cassandra.yaml configuration that makes the switch to vnodes and restart the cluster to apply the vnodes change. 5. Re-create the endpoints keyspace. 6. On each node, bulk-load the snapshots for that particular node. This summary can be reduced even further: 1. On each node, export the data to preserve. 2. On each node, wipe the data. 3. On all nodes, switch to vnodes. 4. On each node, import back in the exported data. I'm sure this process could have been streamlined. One caveat for anyone looking to emulate this: Our situation might have been a little easier to reason about because our original endpoint_messages table had a replication factor of 1. We used the vnodes switch as an opportunity to up the RF to 3. I can only speculate as to why what I was originally attempting wasn't working. But what I was originally attempting wasn't precisely the use case I care about. What I'm following up with now was. On Fri, Jun 19, 2015 at 8:22 PM, Mitch Gitman mgit...@gmail.com wrote: I checked the system.log for the Cassandra node that I did the jconsole JMX session against and which had the data to load. Lot of log output indicating that it's busy loading the files. Lot of stacktraces indicating a broken pipe. I have no reason to believe there are connectivity issues between the nodes, but verifying that is beyond my expertise. What's indicative is this last bit of log output: INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441 StreamReplyVerbHandler.java (line 44) Successfully sent /srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db to /10.205.55.101 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457 OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458 CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to / 10.205.55.101:5,5,RMI Runtime] java.lang.RuntimeException: java.io.IOException: Broken pipe at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at
Re: sstableloader Could not retrieve endpoint ranges
Fabien, thanks for the reply. We do have Thrift enabled. From what I can tell, the Could not retrieve endpoint ranges: crops up under various circumstances. From further reading on sstableloader, it occurred to me that it might be a safer bet to use the JMX StorageService bulkLoad command, considering that the data to import was already on one of the Cassandra nodes, just in an arbitrary directory outside the Cassandra data directories. I was able to get this bulkLoad command to fail with a message that the directory structure did not follow the expected keyspace/table/ pattern. So I created a keyspace directory and then a table directory within that and moved all the files under the table directory. Executed bulkLoad, passing in that directory. It succeeded. Then I went and ran a nodetool refresh on the table in question. Only one problem. If I then went to query the table for, well, anything, nothing came back. And this was after successfully querying the table before and truncating the table just prior to the bulkLoad, so that I knew that only the data coming from the bulkLoad could show up there. Oh, and for good measure, I stopped and started all the nodes too. No luck still. What's puzzling about this is that the bulkLoad silently succeeds, even though it doesn't appear to be doing anything. I haven't bothered yet to check the Cassandra logs. On Fri, Jun 19, 2015 at 12:28 AM, Fabien Rousseau fabifab...@gmail.com wrote: Hi, I already got this error on a 2.1 clusters because thrift was disabled. So you should check that thrift is enabled and accessible from the sstableloader process. Hope this help Fabien Le 19 juin 2015 05:44, Mitch Gitman mgit...@gmail.com a écrit : I'm using sstableloader to bulk-load a table from one cluster to another. I can't just copy sstables because the clusters have different topologies. While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra 1.2.19. The source data comes from a nodetool snapshot. Here's the command I ran: sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/* Here's the result I got: Could not retrieve endpoint ranges: -pr,--principal kerberos principal -k,--keytab keytab location --ssl-keystoressl keystore location --ssl-keystore-password ssl keystore password --ssl-keystore-type ssl keystore type --ssl-truststore ssl truststore location --ssl-truststore-password ssl truststore password --ssl-truststore-type ssl truststore type Not sure what to make of this, what with the hints at security arguments that pop up. The source and destination clusters have no security. Hoping this might ring a bell with someone out there.
Re: sstableloader Could not retrieve endpoint ranges
I checked the system.log for the Cassandra node that I did the jconsole JMX session against and which had the data to load. Lot of log output indicating that it's busy loading the files. Lot of stacktraces indicating a broken pipe. I have no reason to believe there are connectivity issues between the nodes, but verifying that is beyond my expertise. What's indicative is this last bit of log output: INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441 StreamReplyVerbHandler.java (line 44) Successfully sent /srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db to /10.205.55.101 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457 OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458 CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to / 10.205.55.101:5,5,RMI Runtime] java.lang.RuntimeException: java.io.IOException: Broken pipe at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more And then right after that I see what appears to be the output from the nodetool refresh: INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,877 ColumnFamilyStore.java (line 478) Loading new SSTables for endpoints/endpoint_messages... INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,878 ColumnFamilyStore.java (line 524) No new SSTables were found for endpoints/endpoint_messages Notice that Cassandra hasn't found any new SSTables, even though it was just so busy loading them. What's also noteworthy is that the output from the originating node shows it successfully sent endpoints-endpoint_messages-ic-34-Data.db to another node. But then in the system.log for that destination node, I see no mention of that file. What I do see on the destination node are a few INFO messages about streaming one of the .db files, and every time that's immediately followed by an error message: INFO [Thread-108] 2015-06-19 21:20:45,453 StreamInSession.java (line 142) Streaming of file /srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-26-Data.db sections=1 progress=0/105137329 - 0% for org.apache.cassandra.streaming.StreamInSession@46c039ef failed: requesting a retry. ERROR [Thread-109] 2015-06-19 21:20:45,456 CassandraDaemon.java (line 253) Exception in thread Thread[Thread-109,5,main] java.lang.RuntimeException: java.nio.channels.AsynchronousCloseException at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 1 more I don't know, I'm seeing enough flakiness here as to consider Cassandra bulk-loading a lost cause, even if there is something wrong and fixable about my particular cluster. On to exporting and re-importing data at the proprietary application level. Life is too short. On Fri, Jun 19, 2015 at 2:40 PM, Mitch Gitman mgit...@gmail.com wrote: Fabien, thanks for the reply. We do have Thrift enabled. From what I can tell, the Could not retrieve endpoint ranges: crops up under various circumstances. From further reading on sstableloader, it occurred to me that it might be a safer bet to use the JMX StorageService bulkLoad command, considering that the data to import was already on one of the Cassandra nodes, just in an arbitrary directory outside the Cassandra data directories. I was able to get this bulkLoad command to fail with a message that the directory structure did not follow the expected keyspace/table/ pattern. So I created
Re: sstableloader Could not retrieve endpoint ranges
Hi, I already got this error on a 2.1 clusters because thrift was disabled. So you should check that thrift is enabled and accessible from the sstableloader process. Hope this help Fabien Le 19 juin 2015 05:44, Mitch Gitman mgit...@gmail.com a écrit : I'm using sstableloader to bulk-load a table from one cluster to another. I can't just copy sstables because the clusters have different topologies. While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra 1.2.19. The source data comes from a nodetool snapshot. Here's the command I ran: sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/* Here's the result I got: Could not retrieve endpoint ranges: -pr,--principal kerberos principal -k,--keytab keytab location --ssl-keystoressl keystore location --ssl-keystore-password ssl keystore password --ssl-keystore-type ssl keystore type --ssl-truststore ssl truststore location --ssl-truststore-password ssl truststore password --ssl-truststore-type ssl truststore type Not sure what to make of this, what with the hints at security arguments that pop up. The source and destination clusters have no security. Hoping this might ring a bell with someone out there.
sstableloader Could not retrieve endpoint ranges
I'm using sstableloader to bulk-load a table from one cluster to another. I can't just copy sstables because the clusters have different topologies. While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra 1.2.19. The source data comes from a nodetool snapshot. Here's the command I ran: sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/* Here's the result I got: Could not retrieve endpoint ranges: -pr,--principal kerberos principal -k,--keytab keytab location --ssl-keystoressl keystore location --ssl-keystore-password ssl keystore password --ssl-keystore-type ssl keystore type --ssl-truststore ssl truststore location --ssl-truststore-password ssl truststore password --ssl-truststore-type ssl truststore type Not sure what to make of this, what with the hints at security arguments that pop up. The source and destination clusters have no security. Hoping this might ring a bell with someone out there.