[
https://issues.apache.org/jira/browse/CASSANDRA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974462#comment-13974462
]
Jackson Chung commented on CASSANDRA-6415:
------------------------------------------
I ran into the stuck issue on 1.2.10
Upgraded to 1.2.16, I could see repair is not "stuck", in a sense I see
multiple repair sessions/stages started and finished.
But, in the end (after waiting a long time), I see that there is no more
activity from the log, and also compactionstats/netstats, but yet the tpstats
still show Active and Pending count in the stages:
AntiEntropyStage 1 2 5073 0
0
AntiEntropySessions 1 1 44 0
0
> Snapshot repair blocks for ever if something happens to the "I made my
> snapshot" response
> -----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-6415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6415
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jeremiah Jordan
> Assignee: Yuki Morishita
> Labels: repair
> Fix For: 1.2.13, 2.0.4
>
> Attachments: 6415-1.2.txt
>
>
> The "snapshotLatch.await();" can be waiting for ever and block all repair
> operations indefinitely if something happens that another node doesn't
> respond.
> {noformat}
> public void makeSnapshots(Collection<InetAddress> endpoints)
> {
> try
> {
> snapshotLatch = new CountDownLatch(endpoints.size());
> IAsyncCallback callback = new IAsyncCallback()
> {
> public boolean isLatencyForSnitch()
> {
> return false;
> }
> public void response(MessageIn msg)
> {
> RepairJob.this.snapshotLatch.countDown();
> }
> };
> for (InetAddress endpoint : endpoints)
> MessagingService.instance().sendRR(new
> SnapshotCommand(tablename, cfname, sessionName, false).createMessage(),
> endpoint, callback);
> snapshotLatch.await();
> snapshotLatch = null;
> }
> catch (InterruptedException e)
> {
> throw new RuntimeException(e);
> }
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)