[
https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008899#comment-14008899
]
Joshua McKenzie commented on CASSANDRA-3569:
--------------------------------------------
I'm seeing a similar output on the receiving side w/a check for skip < 0 in
drain:
{code:title=receiving_netstats}
Mode: NORMAL
Repair 78e66860-e4e0-11e3-8b10-0195b332f618
/192.168.1.31
Repair 7aadbae0-e4e0-11e3-8b10-0195b332f618
/192.168.1.31
Receiving 4 files, 2383442 bytes total
Repair 79be51d0-e4e0-11e3-8b10-0195b332f618
/192.168.1.31
Receiving 5 files, 866604 bytes total
Repair 7a0a4ef0-e4e0-11e3-8b10-0195b332f618
/192.168.1.31
Receiving 5 files, 477981 bytes total
Repair 79673120-e4e0-11e3-8b10-0195b332f618
/192.168.1.31
Receiving 5 files, 1014129 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 1 25
Responses n/a 76 136
{code}
though that new logic generates the following exception(s):
{code:title=receiving_exception}
ERROR 14:18:11 Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.AssertionError: null
at org.apache.cassandra.io.util.Memory.free(Memory.java:299) ~[main/:na]
at
org.apache.cassandra.utils.obs.OffHeapBitSet.close(OffHeapBitSet.java:143)
~[main/:na]
at org.apache.cassandra.utils.BloomFilter.close(BloomFilter.java:116)
~[main/:na]
at
org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:341)
~[main/:na]
at
org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:326)
~[main/:na]
at
org.apache.cassandra.streaming.StreamReceiveTask$1.run(StreamReceiveTask.java:132)
~[main/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
~[na:1.7.0_55]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_55]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
~[na:1.7.0_55]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
~[na:1.7.0_55]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_55]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_55]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
{code}
It looks like the SessionInfo for these plans aren't getting cleared out for
some reason. While I can't reproduce that behavior on the sending side,
hopefully cleaning that up on the receiving side will shed some light on why
you're seeing that output on the sender.
> Failure detector downs should not break streams
> -----------------------------------------------
>
> Key: CASSANDRA-3569
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3569
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Peter Schuller
> Assignee: Joshua McKenzie
> Fix For: 2.1.1
>
> Attachments: 3569-2.0.txt, 3569_v1.txt
>
>
> CASSANDRA-2433 introduced this behavior just to get repairs to don't sit
> there waiting forever. In my opinion the correct fix to that problem is to
> use TCP keep alive. Unfortunately the TCP keep alive period is insanely high
> by default on a modern Linux, so just doing that is not entirely good either.
> But using the failure detector seems non-sensicle to me. We have a
> communication method which is the TCP transport, that we know is used for
> long-running processes that you don't want to incorrectly be killed for no
> good reason, and we are using a failure detector tuned to detecting when not
> to send real-time sensitive request to nodes in order to actively kill a
> working connection.
> So, rather than add complexity with protocol based ping/pongs and such, I
> propose that we simply just use TCP keep alive for streaming connections and
> instruct operators of production clusters to tweak
> net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent
> on their OS).
> I can submit the patch. Awaiting opinions.
--
This message was sent by Atlassian JIRA
(v6.2#6252)