No, the cluster seems to be performing just fine. It seems that the
prepareForRepair callback() could be easily modified to print which node(s)
are unable to respond, so that the debugging effort could be focused
better. This of course doesn't help this case as it's not trivial to add
the log lines and to roll it out to the entire cluster.

The cluster is relatively young, containing only 450GB with RF=3 spread
over nine nodes and I'm still practicing how to run incremental repairs on
the cluster when I stumbled on this issue.

On Thu, Oct 30, 2014 at 12:52 PM, Rahul Neelakantan <ra...@rahul.be> wrote:

> It appears to come from the ActiveRepairService.prepareForRepair portion
> of the Code.
>
> Are you sure all nodes are reachable from the node you are initiating
> repair on, at the same time?
>
> Any Node up/down/died messages?
>
> Rahul Neelakantan
>
> > On Oct 30, 2014, at 6:37 AM, Juho Mäkinen <juho.maki...@gmail.com>
> wrote:
> >
> > I'm having problems running nodetool repair -inc -par -pr on my 2.1.1
> cluster due to "Did not get positive replies from all endpoints" error.
> >
> > Here's an example output:
> > root@db08-3:~# nodetool repair -par -inc -pr
> > [2014-10-30 10:33:02,396] Nothing to repair for keyspace 'system'
> > [2014-10-30 10:33:02,420] Starting repair command #10, repairing 256
> ranges for keyspace profiles (seq=false, full=false)
> > [2014-10-30 10:33:17,240] Repair failed with error Did not get positive
> replies from all endpoints.
> > [2014-10-30 10:33:17,263] Starting repair command #11, repairing 256
> ranges for keyspace OpsCenter (seq=false, full=false)
> > [2014-10-30 10:33:32,242] Repair failed with error Did not get positive
> replies from all endpoints.
> > [2014-10-30 10:33:32,249] Starting repair command #12, repairing 256
> ranges for keyspace system_traces (seq=false, full=false)
> > [2014-10-30 10:33:44,243] Repair failed with error Did not get positive
> replies from all endpoints.
> >
> > The local system log shows that the repair commands got started, but it
> seems that they immediately get cancelled due to that error, which btw
> can't be seen in the cassandra log.
> >
> > I tried monitoring all logs from all machines in case another machine
> would show up with some useful error, but so far I haven't found nothing.
> >
> > Any ideas where this error comes from?
> >
> >  - Garo
> >
>

Reply via email to