I'd like to improve my mental model of how Cassandra bootstrapping works. My understanding is that bootstrapping is just an extra step during a node's startup where the node copies data from neighboring nodes that, according to its token, it should own; afterwards, the node behaves like any other node.
If that's correct, I have a few follow-up questions: - At what point does the new node get inserted into the hash ring so that reads/writes for keys get directed to it? - What are the semantics of bootstrapping a node that's been in the cluster before and already has some data that's possibly outdated? Should this work? This might be useful if a node's been out of commission for sufficiently long period of time. - If we pick a poor set of initial tokens, would it be sensible to modify the tokens on existing nodes and then restart them with bootstrapping in order to rebalance? I've also noticed that I can get my cassandra cluster into a weird state via bootstrapping, where it stops accepting reads/writes. I'm on Cassandra 0.4.1. A simple repro case is to start all 3 nodes of a 3 node cluster (replication factor of 2) using bootstrapping. Getting a key that I've inserted then leads to an IndexOutOfBoundsException. Another IndexOutOfBoundsException was thrown later while flushing. DEBUG [pool-1-thread-2] 2009-10-28 02:18:19,907 CassandraServer.java (line 258) get DEBUG [pool-1-thread-2] 2009-10-28 02:18:19,908 CassandraServer.java (line 307) multiget ERROR [pool-1-thread-2] 2009-10-28 02:18:19,912 Cassandra.java (line 647) Internal error processing get java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.cassandra.locator.RackUnawareStrategy.getStorageTokens(RackUnawareStrategy.java:99) at org.apache.cassandra.locator.RackUnawareStrategy.getReadStorageEndPoints(RackUnawareStrategy.java:68) at org.apache.cassandra.locator.RackUnawareStrategy.getReadStorageEndPoints(RackUnawareStrategy.java:45) at org.apache.cassandra.service.StorageService.getReadStorageEndPoints(StorageService.java:949) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:296) at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) at org.apache.cassandra.service.CassandraServer.multigetColumns(CassandraServer.java:271) at org.apache.cassandra.service.CassandraServer.multigetInternal(CassandraServer.java:325) at org.apache.cassandra.service.CassandraServer.multiget(CassandraServer.java:308) at org.apache.cassandra.service.CassandraServer.get(CassandraServer.java:259) at org.apache.cassandra.service.Cassandra$Processor$get.process(Cassandra.java:639) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-28 02:18:45,719 ColumnFamilyStore.java (line 369) LocationInfo has reached its threshold; switching in a fresh Memtable INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-28 02:18:45,720 ColumnFamilyStore.java (line 1178) Enqueuing flush of Memtable(LocationInfo)@1048641931 INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-28 02:18:45,721 Memtable.java (line 186) Flushing Memtable(LocationInfo)@1048641931 DEBUG [COMMIT-LOG-WRITER] 2009-10-28 02:18:45,877 CommitLog.java (line 466) discard completed log segments for CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-125\ 6696265813.log', position=423), column family 4. CFIDs are Keyspace1: TableMetadata(Standard2: 1, Super1: 0, Standard1: 2, StandardByUUID1: 3, }), system: TableMetadata(Locatio\ nInfo: 4, HintsColumnFamily: 5, }), Analytics: TableMetadata(total: 6, domain: 7, movie: 8, provider: 9, country: 10, }), } DEBUG [COMMIT-LOG-WRITER] 2009-10-28 02:18:45,878 CommitLog.java (line 509) Marking replay position 423 on commit log /var/lib/cassandra/commitlog/CommitLog-1256696265813.log INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-28 02:18:45,878 Memtable.java (line 220) Completed flushing /var/lib/cassandra/data/system/LocationInfo-1-Data.db DEBUG [BOOT-STRAPPER:1] 2009-10-28 02:18:45,954 BootStrapper.java (line 83) Exception was generated at : 10/28/2009 02:18:45 on thread BOOT-STRAPPER:1 -1 java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.get(ArrayList.java:324) at org.apache.cassandra.service.StorageService.getAllRanges(StorageService.java:886) at org.apache.cassandra.dht.BootStrapper.getRangesWithSourceTarget(BootStrapper.java:98) at org.apache.cassandra.dht.BootStrapper.run(BootStrapper.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Edmond