Yes, I have MVs.

Interesting is also that in the middle of bootstrapping (cannot tell when
exactly) it seemed like other nodes started to send hints to the
bootstrapping node. When that happened, it seems that every single HintVerb
fails also with a WTE. At least the logs are completely flooded with WTE.
When I paused hints on all other nodes, logs were quiet again.

I completely restarted the bootstrap (deleted /var/lib/cassandra) - this
time with hints paused from the beginning. We will see if that changes
anything.

I find it also quite weird that other nodes have hints for a bootstrapping
node. Is that intended behaviour?
And is it possible that streaming locks the whole CF? I looked like
absolutely no hint could be delivered successfully.

2016-11-25 19:43 GMT+01:00 Paulo Motta <pauloricard...@gmail.com>:

> If you have an MV table It seems you're hitting https://issues.apache.
> org/jira/browse/CASSANDRA-12905. I will bump it's priority to critical
> since it can prevent or difficult bootstrap.
>
> Did you try resuming bootstrap with "nodetool bootstrap resume" after the
> failure? It may eventually succeed, since this is an MV lock contention
> problem.
>
> 2016-11-25 15:59 GMT-02:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>
>> Hi!
>>
>> Today I wanted a new node to join the cluster.
>> When looking at netstats on all the old nodes, it seemed like the
>> streaming sessions did complete.
>> They all said that all files have been transferred. But looking at the
>> debug.log the stream sessions finished with an error.
>> Also after all streams have been done the node remains in state "JOINING".
>>
>> See logs:
>>
>> debug.log, last words
>> ========
>> ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712
>> StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Streaming error occurred on session with peer 10.23.71.6
>> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed
>> out - received only 0 responses.
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:223)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>> nRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.
>> 10]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_102]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_102]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>> DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Sending Session Failed
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Finishing keep-alive task.
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713
>> ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Closing stream connection handler on /10.23.71.6
>> INFO  [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Session with /10.23.71.6 is complete
>> DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719
>> StreamCoordinator.java:146 - Finished connecting all sessions
>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,723
>> StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c]
>> Stream failed
>> ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error
>> while waiting on bootstrap to complete. Bootstrap will have to be restarted.
>> java.util.concurrent.ExecutionException: 
>> org.apache.cassandra.streaming.StreamException:
>> Stream failed
>> at com.google.common.util.concurrent.AbstractFuture$Sync.
>> getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>> ~[guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>> ~[guava-18.0.jar:na]
>> at 
>> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:667)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:598)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735)
>> [apache-cassandra-3.10.jar:3.10]
>> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>> at org.apache.cassandra.streaming.StreamResultFuture.maybeCompl
>> ete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamResultFuture.handleSess
>> ionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.
>> 10]
>> at 
>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>> nRunnable.run(StreamReceiveTask.java:251) ~[apache-cassandra-3.10.jar:3.
>> 10]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_102]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[na:1.8.0_102]
>> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
>> WARN  [StreamReceiveTask:94] 2016-11-25 17:50:51,731
>> StorageService.java:1483 - Error during bootstrap.
>> org.apache.cassandra.streaming.StreamException: Stream failed
>> at org.apache.cassandra.streaming.management.StreamEventJMXNoti
>> fier.onFailure(StreamEventJMXNotifier.java:88)
>> ~[apache-cassandra-3.10.jar:3.10]
>> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>> [guava-18.0.jar:na]
>> at com.google.common.util.concurrent.MoreExecutors$DirectExecut
>> or.execute(MoreExecutors.java:457) [guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>> [guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>> [guava-18.0.jar:na]
>> at 
>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>> [guava-18.0.jar:na]
>> at org.apache.cassandra.streaming.StreamResultFuture.maybeCompl
>> ete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamResultFuture.handleSess
>> ionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481)
>> [apache-cassandra-3.10.jar:3.10]
>> at 
>> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571)
>> [apache-cassandra-3.10.jar:3.10]
>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio
>> nRunnable.run(StreamReceiveTask.java:251) [apache-cassandra-3.10.jar:3.1
>> 0]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [na:1.8.0_102]
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_102]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_102]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
>>
>> netstat output of 10.23.71.6, the node mentioned above in the debug.log,
>> obviously all files + bytes have been transferred
>> ==========
>> Mode: NORMAL
>> Bootstrap b998aec0-b2fd-11e6-a63d-75828fa8d45c
>> /10.23.71.8
>> Sending 1598 files, 60610896516 bytes total. Already sent 1598 files,
>> 60610896516 bytes total
>> /var/lib/cassandra/data/log/log_fake-b130c05070e611e6986e29a4f0eae2e7/mc-97218-big-Data.db
>> 29425392/29425392 bytes(100%) sent to idx:0/10.23.71.8
>> ...
>>
>> nodetool info on new node says
>> ==========
>> ID                     : 9dedcc9a-d951-4c7a-b794-434db1af960f
>> Gossip active          : true
>> Thrift active          : true
>> Native Transport active: true
>> Load                   : 185.2 GiB
>> Generation No          : 1480071319
>> Uptime (seconds)       : 25082
>> Heap Memory (MB)       : 3607.60 / 15974.44
>> Off Heap Memory (MB)   : 405.70
>> Data Center            : DC1
>> Rack                   : RAC1
>> Exceptions             : 0
>> Key Cache              : entries 1312994, size 100 MiB, capacity 100 MiB,
>> 126413327 hits, 162672698 requests, 0.777 recent hit rate, 14400 save
>> period in seconds
>> Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0
>> hits, 0 requests, NaN recent hit rate, 0 save period in seconds
>> Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0
>> hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
>> Chunk Cache            : entries 7680, size 480 MiB, capacity 480 MiB,
>> 8277584 misses, 147262566 requests, 0.944 recent hit rate, 835.412
>> microseconds miss latency
>> Percent Repaired       : 4.232661592656687%
>> Token                  : (node is not joined to the cluster)
>>
>> Any idea whats going wrong?
>>
>> Same situation was when I bootstrapped a node last time. In the end I
>> just started the node with auto_bootstrap=false to get it up and running
>> and I ran repair afterwards. I'd like to avoid that repair and all the
>> inconsistencies this time.
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Reply via email to