Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)

Colin Kuo Mon, 09 Jun 2014 22:45:07 -0700

You can use "nodetool repair" instead. Repair is able to re-transmit the
data which belongs to new node.




On Tue, Jun 10, 2014 at 10:40 AM, Mike Heffner <m...@librato.com> wrote:

> Hi,
>
> During an attempt to bootstrap a new node into a 1.2.16 ring the new node
> saw one of the streaming nodes periodically disappear:
>
>  INFO [GossipTasks:1] 2014-06-10 00:28:52,572 Gossiper.java (line 823)
> InetAddress /10.156.1.2 is now DOWN
> ERROR [GossipTasks:1] 2014-06-10 00:28:52,574 AbstractStreamSession.java
> (line 108) Stream failed because /10.156.1.2 died or was
> restarted/removed (streams may still be active in background, but further
> streams won't be started)
>  WARN [GossipTasks:1] 2014-06-10 00:28:52,574 RangeStreamer.java (line
> 246) Streaming from /10.156.1.2 failed
>  INFO [HANDSHAKE-/10.156.1.2] 2014-06-10 00:28:57,922
> OutboundTcpConnection.java (line 418) Handshaking version with /10.156.1.2
>  INFO [GossipStage:1] 2014-06-10 00:28:57,943 Gossiper.java (line 809)
> InetAddress /10.156.1.2 is now UP
>
> This brief interruption was enough to kill the streaming from node
> 10.156.1.2. Node 10.156.1.2 saw a similar "broken pipe" exception from the
> bootstrapping node:
>
> ERROR [Streaming to /10.156.193.1.3] 2014-06-10 01:22:02,345
> CassandraDaemon.java (line 191) Exception in thread Thread[Streaming to /
> 10.156.1.3:1,5,main]
>  java.lang.RuntimeException: java.io.IOException: Broken pipe
>         at com.google.common.base.Throwables.propagate(Throwables.java:160)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>         at
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
>         at
> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
>         at
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>
>
> During bootstrapping we notice a significant spike in CPU and latency
> across the board on the ring (CPU 50->85% and write latencies 60ms ->
> 250ms). It seems likely that this persistent high load led to the hiccup
> that caused the gossiper to see the streaming node as briefly down.
>
> What is the proper way to recover from this? The original estimate was
> almost 24 hours to stream all the data required to bootstrap this single
> node (streaming set to unlimited) and this occurred 6 hours into the
> bootstrap. With such high load from streaming it seems that simply
> restarting will inevitably hit this problem again.
>
>
> Cheers,
>
> Mike
>
> --
>
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>
>

Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)

Reply via email to