[
https://issues.apache.org/jira/browse/CASSANDRA-12280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420800#comment-15420800
]
Benjamin Roth commented on CASSANDRA-12280:
-------------------------------------------
I ran a repair with -tr. It hung. It failed. Reason: Broken pipe after about 18
min of stale stream. I guess I cannot blame c* for a broken pipe, have to
investigate further. All nodes are plugged into the same switch, so a network
outage is very unlikely. Maybe it is a kernel / network driver issue. But this
leads me to another topic:
First:
It is not so nice that c* fails the current stream / plan completely on a
network error. No matter if it is a repair, rebuild, bootstrap, whatever. If
there is a network error, the complete (probably long running, resource
intensive) task has to be restartet and if it fails again, we're maybe stuck in
a very long and tedious loop of retrials.
Isn't there a way to resume that stream / streaming plan? Maybe worth another
ticket?
Second:
I recognized some kind of repair-inconsistency. For testing I ran exactly the
same repair task (range repair on single CF: nodetool repair -full -st
674495715222060467 -et 722653000558723919 visits visits_in) over and over
again. I would expect that once the range is repaired, there should be no "out
of syncs" any more. But this is the result:
https://gist.github.com/brstgt/3066aea556fa2eab59f983d51dd8035a.
Yes, in these cases, the repair returned successfully - at least that is what
nodetool + logs say.
Sorry for all these mixed issues, sure I can create tickets for each of them
but maybe they are all somehow related and I thought I'd maybe first collect
all information in a single place before splitting it again. Just let me know
how to do better.
> nodetool repair hangs
> ---------------------
>
> Key: CASSANDRA-12280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12280
> Project: Cassandra
> Issue Type: Bug
> Reporter: Benjamin Roth
>
> nodetool repair hangs when repairing a keyspace, does not hang when
> repairting table/mv by table/mv.
> Command executed (both variants make it hang):
> nodetool repair likes like dislike_by_source_mv like_by_contact_mv
> match_valid_mv like_out dislike match match_by_contact_mv like_valid_mv
> like_out_by_source_mv
> OR
> nodetool repair likes
> Logs:
> https://gist.github.com/brstgt/bf8b20fa1942d29ab60926ede7340b75
> Nodetool output:
> https://gist.github.com/brstgt/3aa73662da4b0190630ac1aad6c90a6f
> Schema:
> https://gist.github.com/brstgt/3fd59e0166f86f8065085532e3638097
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)