[
https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829017#comment-15829017
]
Jeff Jirsa commented on CASSANDRA-9143:
---------------------------------------
I know [~krummas] has review (and he's certainly the most qualified and most
likely to give the best feedback here), but a I took a few notes as I read
through the patches:
- ARS {{registerParentRepairSession}} assert is inverted
-
https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R88
- double-checked locking needs volatile
- Comment in CompactionStrategyManager about 2 strategies per data dir is now
misleading, if not incorrect (also have pendingRepairs, which may be multiple
other strategies?):
https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L59
- {{ConsistentSession.AbstractBuilder}} doesn't need to be public (especially
with other {{AbstractBuilder}}s in the codebase)
- {{LocalSessions.start()}} loads rows creating builders that should always
work, but we've seen in the past (like CASSANDRA-12700) that we shouldn't rely
on all of those values to be correct - maybe you can try to explicitly handle
invalid rows being returned? If incomplete row/session, maybe that's failed by
definition?
- If you remove the enum definition for anticompaction request (
https://github.com/bdeggleston/cassandra/commit/76ee1a667818c5c72aa513c4a75777b1400cb69d#diff-9a5c76380064186d8f89003e1bab73bfL46
), and we're in a mixed cluster for some reason, and get that verb on the
wire, we'll throw - perhaps instead we should probably keep that verb, but
log+ignore it if received.
Less Importantly:
-
https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R303
- {{needsCleanup()}} could be renamed to avoid potentially confusing
{{CompactionManager.needsCleanup()}}, especially since {{PendingRepairManager}}
is very very much like {{CompactionManager}} (seems as if you may have started
this in
https://github.com/bdeggleston/cassandra/commit/ade7fe3373a1b44da02caaefc3180503d298e92b
)
-
https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L605
- log from address as well, otherwise the log message is much less useful?
-
https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L619
- could pass address for better logging here as well (and pretty much all of
these, {{handlePrepareMessage}}, {{handleFinalizeCommitMessage}}, etc)
> Improving consistency of repairAt field across replicas
> --------------------------------------------------------
>
> Key: CASSANDRA-9143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Blake Eggleston
>
> We currently send an anticompaction request to all replicas. During this, a
> node will split stables and mark the appropriate ones repaired.
> The problem is that this could fail on some replicas due to many reasons
> leading to problems in the next repair.
> This is what I am suggesting to improve it.
> 1) Send anticompaction request to all replicas. This can be done at session
> level.
> 2) During anticompaction, stables are split but not marked repaired.
> 3) When we get positive ack from all replicas, coordinator will send another
> message called markRepaired.
> 4) On getting this message, replicas will mark the appropriate stables as
> repaired.
> This will reduce the window of failure. We can also think of "hinting"
> markRepaired message if required.
> Also the stables which are streaming can be marked as repaired like it is
> done now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)