[ 
https://issues.apache.org/jira/browse/CASSANDRA-12280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420800#comment-15420800
 ] 

Benjamin Roth commented on CASSANDRA-12280:
-------------------------------------------

I ran a repair with -tr. It hung. It failed. Reason: Broken pipe after about 18 
min of stale stream. I guess I cannot blame c* for a broken pipe, have to 
investigate further. All nodes are plugged into the same switch, so a network 
outage is very unlikely. Maybe it is a kernel / network driver issue. But this 
leads me to another topic:

First:
It is not so nice that c* fails the current stream / plan completely on a 
network error. No matter if it is a repair, rebuild, bootstrap, whatever. If 
there is a network error, the complete (probably long running, resource 
intensive) task has to be restartet and if it fails again, we're maybe stuck in 
a very long and tedious loop of retrials.
Isn't there a way to resume that stream / streaming plan? Maybe worth another 
ticket?

Second:
I recognized some kind of repair-inconsistency. For testing I ran exactly the 
same repair task (range repair on single CF: nodetool repair -full -st 
674495715222060467 -et 722653000558723919 visits visits_in) over and over 
again. I would expect that once the range is repaired, there should be no "out 
of syncs" any more. But this is the result:
https://gist.github.com/brstgt/3066aea556fa2eab59f983d51dd8035a.
Yes, in these cases, the repair returned successfully - at least that is what 
nodetool + logs say.

Sorry for all these mixed issues, sure I can create tickets for each of them 
but maybe they are all somehow related and I thought I'd maybe first collect 
all information in a single place before splitting it again. Just let me know 
how to do better.

> nodetool repair hangs
> ---------------------
>
>                 Key: CASSANDRA-12280
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12280
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benjamin Roth
>
> nodetool repair hangs when repairing a keyspace, does not hang when 
> repairting table/mv by table/mv.
> Command executed (both variants make it hang):
> nodetool repair likes like dislike_by_source_mv like_by_contact_mv 
> match_valid_mv like_out dislike match match_by_contact_mv like_valid_mv 
> like_out_by_source_mv
> OR
> nodetool repair likes
> Logs:
> https://gist.github.com/brstgt/bf8b20fa1942d29ab60926ede7340b75
> Nodetool output:
> https://gist.github.com/brstgt/3aa73662da4b0190630ac1aad6c90a6f
> Schema:
> https://gist.github.com/brstgt/3fd59e0166f86f8065085532e3638097



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to