Re: Proposal to retroactively mark materialized views experimental

Benedict Elliott Smith Tue, 03 Oct 2017 18:32:01 -0700

So, I'm of the opinion there's a difference between users misusing a well 
understood feature whose shortcomings are widely discussed in the community, 
and providing a feature we don't fully understand, have not fully documented 
the caveats of, let alone discovered all the problems with nor had that 
knowledge percolate fully into the wider community.


I also think there's a huge difference between users shooting themselves in the 
foot, and us shooting them in the foot.  

There's a degree of trust - undeserved - that goes with being a database.  
People assume you're smarter than them, and that it Just Works.  Given this, 
and that squandering this trust as a bad thing, I personally believe it is 
better to offer the feature as experimental until we iron out all of the 
problems, fully understand it, and have a wider community knowledge base around 
it.

We can still encourage users that can tolerate problems to use it, but we won't 
be giving any false assurances to those that don't.  Doesn't that seem like a 
win-win?



> On 3 Oct 2017, at 21:07, Jeremiah D Jordan <jeremiah.jor...@gmail.com> wrote:
> 
> So for some perspective here, how do users who do not get the guarantees of 
> MV’s implement this on their own?  They used logged batches.
> 
> Pseudo CQL here, but you should get the picture:
> 
> If they don’t ever update data, they do it like so, and it is pretty safe:
> BEGIN BATCH
> INSERT tablea blah
> INSERT tableb blahview
> END BATCH
> 
> If they do update data, they likely do it like so, and get it wrong in the 
> face of concurrency:
> SELECT * from tablea WHERE blah;
> 
> BEGIN BATCH
> INSERT tablea blah
> INSERT tableb blahview
> DELETE tableb oldblahview
> END BATCH
> 
> A sophisticated user that understands the concurrency issues may well try to 
> implement it like so:
> 
> SELECT key, col1, col2 FROM tablea WHERE key=blah;
> 
> BEGIN BATCH
> UPDATE tablea col1=new1, col2=new2 WHERE key=blah IF col1=old1 and col2=old2
> UPDATE tableb viewc1=new2, viewc2=blah WHERE key=new1
> DELETE tableb WHERE key=old1
> END BATCH
> 
> And it wouldn’t work because you can only use LWT in a BATCH if all updates 
> have the same partition key value, and the whole point of a view most of the 
> time is that it doesn't (and there are other issues with this, like most 
> likely needing to use uuid’s or something else to distinguish between 
> concurrent updates, that are not realized until it is too late).
> 
> A user who does not dig in and understand how MV’s work, most likely also 
> does not dig in to understand the trade offs and draw backs of logged batches 
> to multiple tables across different partition keys.  Or even necessarily of 
> read before writes, and concurrent updates and the races inherent in them.  I 
> would guess that using MV’s, even as they are today is *safer* for these 
> users than rolling their own.  I have seen these patterns implemented by 
> people many times, including the “broken in the face of concurrency” version. 
>  So lets please not try to argue that a casual user that does not dig in to 
> the specifics of feature A is going dig in and understand the specifics of 
> any other features.  So yes, I would prefer my bank to use MV’s as they are 
> today over rolling their own, and getting it even more wrong.
> 
> Now, even given all that, if we want to warn users of the pit falls of using 
> MV’s, then lets do that.  But lets keep some perspective on how things 
> actually get used.
> 
> -Jeremiah
> 
>> On Oct 3, 2017, at 8:12 PM, Benedict Elliott Smith <_...@belliottsmith.com> 
>> wrote:
>> 
>> While many users may apparently be using MVs successfully, the problem is 
>> how few (if any) know what guarantees they are getting.  Since we aren’t 
>> even absolutely certain ourselves, it cannot be many.  Most of the 
>> shortcomings we are aware of are complicated, concern failure scenarios and 
>> aren’t fully explained; i.e. if you’re lucky they’ll never be a problem, but 
>> some users must surely be bitten, and they won’t have had fair warning.  The 
>> same goes for as-yet undiscovered edge cases.
>> 
>> It is my humble opinion that averting problems like this for just a handful 
>> of users, that cannot readily tolerate corruption, offsets any inconvenience 
>> we might cause to those who can.
>> 
>> For the record, while it’s true that detecting inconsistencies is as much of 
>> a problem for user-rolled solutions, it’s worth remembering that the 
>> inconsistencies themselves are not equally likely:
>> 
>> In cases where C* is not the database of record, it is quite easy to provide 
>> very good consistency guarantees when rolling your own
>> Conversely, a global-CAS with synchronous QUORUM updates that are retried 
>> until success, while much slower, also doesn’t easily suffer these 
>> consistency problems, and is the naive approach a user might take if C* were 
>> the database of record
>> 
>> Given our approach isn’t uniformly superior, I think we should be very 
>> cautious about how it is made available until we’re very confident in it, 
>> and we and the community fully understand it.
>> 
>> 
>>> On 3 Oct 2017, at 18:51, kurt greaves <k...@instaclustr.com> wrote:
>>> 
>>> Lots of users are already using MV's, believe it or not in some cases quite
>>> effectively and also on older versions which were still exposed to a lot of
>>> the bugs that cause inconsistencies. 3.11.1 has come a long way since then
>>> and I think with a bit more documentation around the current issues marking
>>> MV's as experimental is unnecessary and likely annoying for current users.
>>> On that note we've already had complaints about changing defaults and
>>> behaviours willy nilly across majors and minors, I can't see this helping
>>> our cause. Sure, you can make it "seamless" from an upgrade perspective,
>>> but that doesn't account for every single way operators do things. I'm sure
>>> someone will express surprise when they run up a new cluster or datacenter
>>> for testing with default config and find out that they have to enable MV's.
>>> Meanwhile they've been using them the whole time and haven't had any major
>>> issues because they didn't touch the edge cases.
>>> 
>>> I'd like to point out that introducing "experimental" features sets a
>>> precedent for future releases, and will likely result in using the
>>> "experimental" tag to push out features that are not ready (again). In fact
>>> we already routinely say >=3 isn't production ready yet, so why don't we
>>> just mark 3+ as "experimental" as well? I don't think experimental is the
>>> right approach for a database. The better solution, as I said, is more
>>> verification and testing during the release process (by users!). A lot of
>>> other projects take this approach, and it certainly makes sense. It could
>>> also be coupled with beta releases, so people can start getting
>>> verification of their new features at an earlier date. Granted this is
>>> similar to experimental features, but applied to the whole release rather
>>> than just individual features.
>>> 
>>> * There's no way to determine if a view is out of sync with the base table.
>>>> 
>>> As already pointed out by Jake, this is still true when you don't use
>>> MV's. We should document this. I think it's entirely fair to say that
>>> users *should
>>> not *expect this to be done for them. There is also no way for a user to
>>> determine they have inconsistencies short of their own verification. And
>>> also a lot of the synchronisation problems have been resolved, undoubtedly
>>> there are more unknowns out there but what MV's have is still better than
>>> managing your own.
>>> 
>>>> * If you do determine that a view is out of sync, the only way to fix it
>>>> is to drop and rebuild the view.
>>>> 
>>> This is undoubtedly a problem, but also no worse than managing your own
>>> views. Also at least there is still a way to fix your view. It certainly
>>> shouldn't be as common in 3.11.1/3.0.15, and we have enough insight now to
>>> be able to tell when out of sync will actually occur, so we can document
>>> those cases.
>>> 
>>>> * There are liveness issues with updates being reflected in the view.
>>> 
>>> What specific issues are you referring to here? The only one I'm aware of
>>> is deletion of unselected columns in the view affecting out of order
>>> updates. If we deem this a major problem we can document it or at least put
>>> a restriction in place until it's fixed in CASSANDRA-13826
>>> <https://issues.apache.org/jira/browse/CASSANDRA-13826>
>>> 
>>> 
>>> In this case, 'out of sync' means 'you lost data', since the current design
>>>> + repair should keep things eventually consistent right?
>>> 
>>> I'd like Zhao or Paulo to confirm here but I believe the only way you can
>>> really "lose data" (that can't be repaired) here would be partition
>>> deletions on massively wide rows in the view that will not fit in the
>>> batchlog (256mb/max value size) as it currently stands. Frankly this is
>>> probably an anti-pattern for MV's at the moment anyway and one we should
>>> advise against.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposal to retroactively mark materialized views experimental

Reply via email to