[
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360188#comment-14360188
]
Benedict commented on CASSANDRA-8833:
-------------------------------------
At the risk of sounding like a broken record (in case my earlier statement was
missed): bloom filter resizing would need to break this assumption also, and is
sort of intrinsically linked to the discussion around summary resizing. Whether
or not we want this is another matter I'll leave aside for the moment.
Just to outline the remainder of my strategy for mitigating any and all of
these risks:
* In CASSANDRA-8568, I will:
** introduce a "stable" set of sstables that never changes, and all
non-hot-path accesses will use these so they don't risk confusion, and don't
have to worry about first/last issues
** ensure compaction strategies are only informed of changes to this "stable"
set of readers
** remove the "shadowed" state of an sstable
** make the modification of tracker state transactional and more declarative,
so both easier to follow and much harder to let get into a bad state
* CASSANDRA-8893, CASSANDRA-7066 and some related work will:
** eliminate the distinction between early open, temporary, and final files on
disk, so eliminate at least one layer of the cleanup logic (i.e. make its
requirements equivalent to summary/bf resizing)
** which also permits us to simplify the early open logic, by special casing it
much less
In conjunction with the major overhaul of resource cleanup, AFAICT this
mitigates most of the problems:
* resource counting is now much easier to reason about, and soon will be even
easier. it is also safer to get it wrong.
* only paths we know are safe to use overlapping sstables will do so (and in
parallel we also enforce the non-overlapping rule)
* compaction doesn't have to even be aware it is happening
* what I think has been the biggest problem, the actual safe application of
state changes (which were never atomic and could actually screw themselves up
willfully through assertions) will be transactional and ensure exceptions do
not interrupt their execution. it will also encapsulate its own safe rollback,
so if we screw up somewhere, it will fix it for us.
I don't pretend it'll be 100% first time, but I think this new state will be
safer by a significant margin than the pre-early-open state, which we are still
seeing bug reports for in the 2.0 line, and has been the cause of many serious
bugs (and at least one major public downtime of a well known deployment). I
very much hope all of these changes will restore confidence in not only the
early open feature, but resource management in general, and hopefully reduce
the burden on all maintainers.
> Stop opening compaction results early
> -------------------------------------
>
> Key: CASSANDRA-8833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Marcus Eriksson
> Fix For: 2.1.4
>
>
> We should simplify the code base by not doing early opening of compaction
> results. It makes it very hard to reason about sstable life cycles since they
> can be in many different states, "opened early", "starts moved", "shadowed",
> "final", instead of as before, basically just one (tmp files are not really
> 'live' yet so I don't count those). The ref counting of shared resources
> between sstables in these different states is also hard to reason about. This
> has caused quite a few issues since we released 2.1
> I think it all boils down to a performance vs code complexity issue, is
> opening compaction results early really 'worth it' wrt the performance gain?
> The results in CASSANDRA-6916 sure look like the benefits are big enough, but
> the difference should not be as big for people on SSDs (which most people who
> care about latencies are)
> WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)