[
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082690#comment-17082690
]
David Capwell commented on CASSANDRA-15674:
-------------------------------------------
bq. why you chose the hooks that were added vs using the existing
onCommit/onAbort API with a delegate or sub-class implementation used only for
IndexSummaryRedistribution.
IndexSummaryRedistribution mutates disk in-place but relies on the metrics to
be updated via transactions. Right now the commit case works since
IndexSummaryRedistribution will delete the current size (before the mutation)
so the readd will update the size. In the abort case we need to know the
deltas (can only compute right after mutating), so would need a list of deltas
to apply in onAbort. This logic is very specific to IndexSummaryRedistribution
and would rely on extending LifecycleTransaction to apply this list. I felt
that commit/abort hooks were a generalization of this and could be used if
other use cases needed.
bq. Pending counter: can you say a bit more about how you would see an operator
using this metric?
PendingSSTableReleases was mostly added for tests to know when everything is
fully released (couldn't find any other way). Its also the reason
liveDiskSpaceUsed !=totalDiskSpaceUsed, so it also exposes to operators that
things line up (if they are not equal then pending should be > 0; else there is
a bug).
bq. consider renaming Listener#onPreSample to Listener#beforeResample
Done.
bq. IndexSummaryRedistribution constructor should be marked @VisibleForTest
Done
bq. DiskSpaceMetricsTest: Is the TODO on L125 still valid or can it be removed?
Looks like the latter
There is desire to extract the jvm dtest tests out so they can run against any
version. This TODO is mostly commenting that the test could not be written
generic to versions since it depends on JMX values which are not implemented in
jvm dtests.
I can file a ticket and remove the TODO.
> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if
> IndexSummaryRedistribution gets interrupted
> -----------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Compaction, Observability/Metrics
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder
> and supports cancelation by throwing a CompactionInterruptedException. The
> issue is that IndexSummaryRedistribution tries to use transactions, but
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task
> attempts to also mutate the two metrics liveDiskSpaceUsed and
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could
> corrupt these metrics.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]