+1 to making DROP CS experimental (i.e. disabling it by default) w/ a link to the docs explaining the possible side effects
The sooner we do that, the more defensible https://issues.apache.org/jira/browse/CASSANDRA-16675 (a proposed solution to the query performance issue mentioned above) becomes. On Mon, Jun 7, 2021 at 3:07 AM Oleksandr Petrov <oleksandr.pet...@gmail.com> wrote: > Thank you for bringing this subject up. > > > not ready for production use unless users fully understand what they are > doing. > > Thing is, there's no easy way around dropping compact storage. At the > moment of writing of 10857, we have collectively decided that we'll > document that the new columns are going to be shown, and have added a > client protocol option that would hide/show columns depending on the mode > we're running it in for anyone who upgrades. This makes it harder to make > a transition for anyone who controls only the server side, since you have > to account for how clients would behave whenever they see a new column. We > did try to patch around the shown columns, but because of ColumnFilter this > also turned out to be non-trivial, or at least not worth the effort for the > moment. > > One of the things mentioned in this list (primary key liveness) is also > existing as a difference between UPDATE and INSERT, but I'm not sure if > it's properly documented. Similar to some other nuances, such as nulls in > clustering keys on partitions that only have a static row. We did recently > discuss some of these not-commonly-known cases with Benjamin and some other > folks. So it might be worth documenting those, too. > > Problem with compact storage is that very few people want to touch it, and > it requires a non-trivial amount of "institutional" knowledge and > remembering things about Thrift. I think it's OK to mark the feature as > experimental, but remembering how we haven't made significant improvements > to things we have previously marked as experimental, this one may not > materialise into something final, too. > > What would a complete, non-foot-gun solution for dropping compact storage > entail? If we're talking about avoiding showing columns to users, there are > ways to achieve this without rewriting sstables, for example, by > introducing "hidden" columns in table metadata. However, if we want to > preserve deletion semantics, I'm not sure if we're doing it right at all: > we'll just trade one source of difference for partition liveness for insert > queries for the other, so I'd say that, by executing ALTER TABLE statement, > you're accepting that after it propagates, there will be at least some > difference in behaviour and semantics. We did discuss this in C-16069, and > my thesis back then was that replacing special-casing for compact tables > with special casing for tables that "used to be compact" isn't bringing us > closer to the final solution. > > To summarise, I don't mind if we mark this feature experimental, but if we > want to ever make it complete, we have to discuss what we do with each of > the special cases. And it may very well be that we just need to add > explicit hidden columns to metadata, and allow nulls for clusterings, maybe > several more small changes. Unless we define what it would take to get this > feature out of experimental state, and actually make an effort to resolve > these issues, I'd just put a huge warning and call it a power-user feature. > > > On Fri, Jun 4, 2021 at 5:01 PM Joshua McKenzie <jmcken...@apache.org> > wrote: > > > > > > > not ready for production use unless users fully understand what they > are > > > doing. > > > > This statement stood out to me - in my opinion we should think carefully > > about the surface area of the user interfaces on new features before we > add > > more cognitive burden to our users. We already have plenty of "foot-guns" > > in the project and should only add more if absolutely necessary. > > > > Further, marking this as experimental would be another feature we've > > released and then retroactively marked as experimental; that's a habit we > > should not get into. > > > > On balance, my .02 is the benefits to our end users and operators of > > getting 4.0 to GA outweigh the costs of flagging this as experimental now > > so I'm a +1 to the flagging idea, but I think there's some valuable > lessons > > for us to learn in retrospect from not just this feature but others like > it > > in the past. > > > > Curious to hear Alex' thoughts about this situation in particular as > author > > of C-10857. I recall that being a pretty painful slog so apologies in > > advance for picking at this scab. :) > > > > > > > > On Fri, Jun 4, 2021 at 9:44 AM Brandon Williams <dri...@gmail.com> > wrote: > > > > > +1 > > > > > > On Fri, Jun 4, 2021, 3:53 AM Benjamin Lerer <ble...@apache.org> wrote: > > > > > > > Hi everybody, > > > > > > > > There are a significant amount of issues with DROP COMPACT STORAGE > that > > > can > > > > be pretty surprising for users. > > > > To name a few: > > > > * Some hidden columns will show up changing the resultset returned > for > > > > wildcard queries > > > > * As COMPACT tables did not have primary key liveness there empty > rows > > > > inserted AFTER the ALTER will be returned whereas the one inserted > > before > > > > the ALTER will not. > > > > * Also due to the lack of primary key liveness the amount of SSTables > > > being > > > > read will increase resulting in slower queries > > > > * After DROP COMPACT it becomes possible to ALTER the table in a way > > that > > > > makes all the row disappears > > > > * There is a loss of functionality around null clustering when > dropping > > > > compact storage (CASSANDRA-16069) > > > > > > > > In my opinion DROP COMPACT STORAGE is not ready for production use > > unless > > > > users fully understand what they are doing. > > > > By consequence, I am wondering if we should not mark it as > experimental > > > as > > > > we did for the Materialized Views (CASSANDRA-13959). > > > > > > > > What is your opinion? > > > > > > > > > > > > -- > alex p >