[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360188#comment-14360188
 ] 

Benedict commented on CASSANDRA-8833:
-------------------------------------

At the risk of sounding like a broken record (in case my earlier statement was 
missed): bloom filter resizing would need to break this assumption also, and is 
sort of intrinsically linked to the discussion around summary resizing. Whether 
or not we want this is another matter I'll leave aside for the moment. 

Just to outline the remainder of my strategy for mitigating any and all of 
these risks:

* In CASSANDRA-8568, I will:
** introduce a "stable" set of sstables that never changes, and all 
non-hot-path accesses will use these so they don't risk confusion, and don't 
have to worry about first/last issues
** ensure compaction strategies are only informed of changes to this "stable" 
set of readers
** remove the "shadowed" state of an sstable
** make the modification of tracker state transactional and more declarative, 
so both easier to follow and much harder to let get into a bad state
* CASSANDRA-8893, CASSANDRA-7066 and some related work will:
** eliminate the distinction between early open, temporary, and final files on 
disk, so eliminate at least one layer of the cleanup logic (i.e. make its 
requirements equivalent to summary/bf resizing)
** which also permits us to simplify the early open logic, by special casing it 
much less

In conjunction with the major overhaul of resource cleanup, AFAICT this 
mitigates most of the problems: 

* resource counting is now much easier to reason about, and soon will be even 
easier. it is also safer to get it wrong.
* only paths we know are safe to use overlapping sstables will do so (and in 
parallel we also enforce the non-overlapping rule)
* compaction doesn't have to even be aware it is happening
* what I think has been the biggest problem, the actual safe application of 
state changes (which were never atomic and could actually screw themselves up 
willfully through assertions) will be transactional and ensure exceptions do 
not interrupt their execution. it will also encapsulate its own safe rollback, 
so if we screw up somewhere, it will fix it for us.

I don't pretend it'll be 100% first time, but I think this new state will be 
safer by a significant margin than the pre-early-open state, which we are still 
seeing bug reports for in the 2.0 line, and has been the cause of many serious 
bugs (and at least one major public downtime of a well known deployment). I 
very much hope all of these changes will restore confidence in not only the 
early open feature, but resource management in general, and hopefully reduce 
the burden on all maintainers.

> Stop opening compaction results early
> -------------------------------------
>
>                 Key: CASSANDRA-8833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>             Fix For: 2.1.4
>
>
> We should simplify the code base by not doing early opening of compaction 
> results. It makes it very hard to reason about sstable life cycles since they 
> can be in many different states, "opened early", "starts moved", "shadowed", 
> "final", instead of as before, basically just one (tmp files are not really 
> 'live' yet so I don't count those). The ref counting of shared resources 
> between sstables in these different states is also hard to reason about. This 
> has caused quite a few issues since we released 2.1
> I think it all boils down to a performance vs code complexity issue, is 
> opening compaction results early really 'worth it' wrt the performance gain? 
> The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
> the difference should not be as big for people on SSDs (which most people who 
> care about latencies are)
> WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to