[ 
https://issues.apache.org/jira/browse/CASSANDRA-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688571#comment-17688571
 ] 

C. Scott Andreas commented on CASSANDRA-18134:
----------------------------------------------

I disagree that downgradeability is a concern that should be pushed to a 
separate ticket. It's important that we address all aspects of SSTable format 
changes in the ticket that proposes them rather than deferring to future work.

Fortunately, there are a couple great strategies that work to provide 
downgradability - and they're not mutually exclusive.
 * One is to implement forward-compatibility. That would involve implementing 
the ability to read "-oa" format SSTables in 4.x. This would satisfy the 
property of downgradeability such that a user who has upgraded to 5.x can 
safely revert to a 4.x build (say, 4.2) that is capable of reading the new 
format.
 * Another is to adopt a flag to begin writing the new format once an operator 
has determined that post-upgrade their clusters are sufficiently stable. This 
is an approach that HDFS has adopted. Following a rolling upgrade of HDFS, 
downgrade remains possible until an operator executes a "finalize" operation to 
migrate NameNode metadata to the new version's.

One of the biggest hurdles in completing 4.0 upgrades was explaining to 
hundreds of people that the upgrade is a completely irreversible, one-way trip. 
The vast majority of upgrades went completely fine, but those that weren't had 
some very unpleasant followups due to the inability to back it out.

I'd love to see us approach SSTable format changes with the dual approaches 
described above: forward-compatibility in a previous release; and a flag to 
adopt the new data format post-upgrade.

> Improve handling of min/max clustering in sstable
> -------------------------------------------------
>
>                 Key: CASSANDRA-18134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18134
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 5.x
>
>
> This patch improves the following things:
> # SSTable metadata will store a covered slice instead of min/max clusterings. 
> The difference is that for slices there is available the type of a bound 
> rather than just a clustering. In particular it will provide the information 
> whether the lower and upper bound of an sstable is opened or closed.
> # SSTable metadata will store a flag whether the SSTable contains any 
> partition level deletions or not
> # The above two changes required to introduce a new major format for SSTables 
> - {{oa}}
> # Single partition read command makes use of the above changes. In particular 
> an sstable can be skipped when it does not intersect with the column filter, 
> does not have partition level deletions and does not have statics; In case 
> there are partition level deletions, but the other conditions are satisfied, 
> only the partition header needs to be accessed (tests attached)
> # Skipping sstables assuming those three conditions are satisfied has been 
> implemented also for partition range queries (tests attached). Also added 
> minor separate statistics to record the number of accessed sstables in 
> partition reads because now not all of them need to be accessed. That 
> statistics is also needed in tests to confirm skipping.
> # Artificial lower bound marker is now an object on its own and is not 
> implemented as a special case of range tombstone bound. Instead it sorts 
> right before the lowest available bound in the data
> # Extended the lower bound optimization usage due the 1 and 2
> # Do not initialize iterator just to get a cached partition and associated 
> columns index. The purpose of using lower bound optimization was to avoid 
> opening an iterator of an sstable if possible.
> See also CASSANDRA-14861
> The changes in this patch include work of [~blambov], [~slebresne], 
> [~jakubzytka] and [~jlewandowski]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to