Jason Kania created CASSANDRA-11273:
---------------------------------------
Summary: Exceptions during bootstrap cause bootstrap to hang
(WORKAROUND)
Key: CASSANDRA-11273
URL: https://issues.apache.org/jira/browse/CASSANDRA-11273
Project: Cassandra
Issue Type: Bug
Components: Lifecycle
Environment: debian jesse patch current running Cassandra 3.0.3
Reporter: Jason Kania
When running bootstrap on a new node, the following problem can occur because
Cassandra fails to recognize columns for some reason. The error prevents the
bootstrap from finishing and hangs the bootstrap. If the bootstrap is resumed,
it will get the same error and bootstrap cannot be completed.
from 192.168.10.8
ERROR [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamSession.java:635
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Remote peer 192.168.10.10
failed stream session.
INFO [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857
StreamResultFuture.java:182 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Session with /192.168.10.10 is complete
WARN [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,858
StreamResultFuture.java:209 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Stream failed
from 192.168.10.8 debug
DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,414
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received Received (79256340-bbbb-11e5-9f70-7d76a8de8480, #0)
DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,854
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received Retry (f3a137e0-024b-11e5-bb31-0d2316086bf7, #0)
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854
ConnectionHandler.java:334 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Sending File (Header (cfId: f3a137e0-024b-11e5-bb31-0d2316086bf7, #0, version:
ma, format: BIG, estimated keys: 128, transfer size: 4653, compressed?: true,
repairedAt: 0, level: 0), file:
/home/cassandra/data/sensordb/sensor/ma-76-big-Data.db)
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854
CompressedStreamWriter.java:63 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Start streaming file /home/cassandra/data/sensordb/sensor/ma-76-big-Data.db to
/192.168.10.10, repairedAt = 0, totalSize = 4653
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,854
CompressedStreamWriter.java:94 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Finished streaming file /home/cassandra/data/sensordb/sensor/ma-76-big-Data.db
to /192.168.10.10, bytesTransferred = 4653, totalSize = 4653
DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,855
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received Retry (faa55490-024b-11e5-bb31-0d2316086bf7, #0)
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,855
ConnectionHandler.java:334 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Sending File (Header (cfId: faa55490-024b-11e5-bb31-0d2316086bf7, #0, version:
ma, format: BIG, estimated keys: 128, transfer size: 705, compressed?: true,
repairedAt: 0, level: 0), file:
/home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db)
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,856
CompressedStreamWriter.java:63 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Start streaming file /home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db
to /192.168.10.10, repairedAt = 0, totalSize = 705
DEBUG [STREAM-OUT-/192.168.10.10] 2016-02-27 20:37:53,856
CompressedStreamWriter.java:94 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Finished streaming file
/home/cassandra/data/sensordb/sensorUnit/ma-79-big-Data.db to /192.168.10.10,
bytesTransferred = 705, totalSize = 705
DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received Session Failed
ERROR [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857 StreamSession.java:635
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Remote peer 192.168.10.10
failed stream session.
DEBUG [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857
ConnectionHandler.java:110 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Closing stream connection handler on /192.168.10.10
INFO [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,857
StreamResultFuture.java:182 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Session with /192.168.10.10 is complete
WARN [STREAM-IN-/192.168.10.10] 2016-02-27 20:37:53,858
StreamResultFuture.java:209 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Stream failed
from 192.168.10.10
[2016-02-27 20:37:53,413] received file
/home/cassandra/data/sensordb/listedAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db
(progress: 365%)
[2016-02-27 20:37:53,414] received file
/home/cassandra/data/sensordb/liestedAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db
(progress: 369%)
[2016-02-27 20:37:53,865] session with /192.168.10.8 complete (progress: 369%)
[2016-02-27 20:37:53,866] Stream failed
from 192.168.10.10 debug
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,201
CompressedStreamReader.java:80 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Start receiving file #0 from /192.168.10.8, repairedAt = 0, size = 166627, ks =
'sensordb', table = 'listAttributes'.
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,412
CompressedStreamReader.java:110 - [Stream
#c9868f90-ddbb-11e5-80c0-89f591237aca] Finished receiving file #0 from
/192.168.10.8 readBytes = 166627, totalSize = 166627
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,412
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received File (Header (cfId: 79256340-bbbb-11e5-9f70-7d76a8de8480, #0, version:
ma, format: BIG, estimated keys: 128, transfer size: 166627, compressed?: true,
repairedAt: 0, level: 0), file:
/home/cassandra/data/sensordb/listAttributes-79256340bbbb11e59f707d76a8de8480/ma-32-big-Data.db)
DEBUG [STREAM-OUT-/192.168.10.8] 2016-02-27 20:37:53,412
ConnectionHandler.java:334 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Sending Received (79256340-bbbb-11e5-9f70-7d76a8de8480, #0)
DEBUG [CompactionExecutor:3] 2016-02-27 20:37:53,833 CompactionTask.java:217 -
Compacted (e224bef0-ddbb-11e5-80c0-89f591237aca) 4 sstables to
[/home/cassandra/data/system_distributed/parent_repair_history-deabd734b99d3b9c92e5fd92eb5abf14/ma-5-big,]
to level=0. 2,743,164 bytes to 685,791 (~25% of original) in 1,096ms =
0.596735MB/s. 0 total partitions merged to 57. Partition merge counts were
{4:57, }
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,850
CompressedStreamReader.java:80 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Start receiving file #0 from /192.168.10.8, repairedAt = 0, size = 4653, ks =
'sensordb', table = 'sensor'.
WARN [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,851 StreamSession.java:641
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Retrying for following error
java.lang.RuntimeException: Unknown column lastEvaluation during deserialization
at
org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
~[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:87)
~[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:50)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
[apache-cassandra-3.0.3.jar:3.0.3]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
DEBUG [STREAM-OUT-/192.168.10.8] 2016-02-27 20:37:53,852
ConnectionHandler.java:334 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Sending Retry (f3a137e0-024b-11e5-bb31-0d2316086bf7, #0)
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,852
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received null
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,853
CompressedStreamReader.java:80 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Start receiving file #0 from /192.168.10.8, repairedAt = 0, size = 705, ks =
'sensordb', table = 'sensorUnit'.
WARN [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,854 StreamSession.java:641
- [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca] Retrying for following error
java.lang.RuntimeException: Unknown column lastCheckTime during deserialization
at
org.apache.cassandra.db.SerializationHeader$Component.toHeader(SerializationHeader.java:331)
~[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:87)
~[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:50)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:39)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:59)
[apache-cassandra-3.0.3.jar:3.0.3]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:261)
[apache-cassandra-3.0.3.jar:3.0.3]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
DEBUG [STREAM-IN-/192.168.10.8] 2016-02-27 20:37:53,854
ConnectionHandler.java:262 - [Stream #c9868f90-ddbb-11e5-80c0-89f591237aca]
Received null
Possible Work around
To resolve this, it is possible to do the following:
1) in cqlsh on the new node
use system;
select host_id from local
2) Save that host_id uuid for later use
3) Change the cassandra.yaml to set auto_bootstrap to false
4) Stop the database on the new node
5) Remove all the contents of the data directory on the new node
6) Copy all files from the data directory on an existing replica node to the
data directory on new node
7) Start Cassandra on the new node in network isolation or restart Cassandra on
the other nodes in the cluster after starting the new node
8) In cqlsh on the new node
use system;
update local set host_id=<host id saved previously>,tokens=null where
key='local';
update local set broadcast_address='<local IP>',listen_address='<local
IP>',rpc_address='<local IP>' where key='local';
9) On the new node run the following to save the updated system data
nodetool flush system local
10) Restart cassandra on the new node
11) Run the following on the new node to generate the data tokens
nodetool repair
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)