Re: State of Materialized Views

2017-07-25 Thread Josh McKenzie
Status of above is on our collective radars. As always, interleaving
reviews with other work is a challenge.

On Mon, Jul 24, 2017 at 7:05 PM, Nate McCall  wrote:

> >
> > We're working on the following MV-related issues in the 4.0 time-frame:
> > CASSANDRA-13162
> > CASSANDRA-13547
> Patch Available
>
> > CASSANDRA-13127
> Patch Available
>
> > CASSANDRA-13409
> Patch Available
>
> > CASSANDRA-12952
> Patch Available
>
> > CASSANDRA-13069
> > CASSANDRA-12888
> >
>
> Josh - want to make sure folks are not duplicating effort here, is the
> status of the above on your radar? Regardless, I appreciate the
> communication. Thanks for that!
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: State of Materialized Views

2017-07-24 Thread Nate McCall
>
> We're working on the following MV-related issues in the 4.0 time-frame:
> CASSANDRA-13162
> CASSANDRA-13547
Patch Available

> CASSANDRA-13127
Patch Available

> CASSANDRA-13409
Patch Available

> CASSANDRA-12952
Patch Available

> CASSANDRA-13069
> CASSANDRA-12888
>

Josh - want to make sure folks are not duplicating effort here, is the
status of the above on your radar? Regardless, I appreciate the
communication. Thanks for that!

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: State of Materialized Views

2017-07-24 Thread Carlos Rolo
We have a couple of big deployments with MV in production, I will try to
get some help in form of testing and validation. Will do my best to try and
contribute to the codebase too.



Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Mon, Jul 24, 2017 at 3:48 PM, Josh McKenzie  wrote:

> >
> > Who is "we" in this case?
>
>
> Initial contributors (myself + Jake, Carl's no longer active on the
> project), Zhao, Andres, Paulo, Sylvain, etc. The people who are publicly,
> actively working on MV issues atm.
>
> On Mon, Jul 24, 2017 at 9:46 AM, benjamin roth  wrote:
>
> > Hi Josh,
> >
> > Who is "we" in this case?
> >
> > Best,
> > Ben
> >
> > 2017-07-24 15:41 GMT+02:00 Josh McKenzie :
> >
> > > >
> > > > The initial contributors turned their back on MVs
> > >
> > >
> > > We're working on the following MV-related issues in the 4.0 time-frame:
> > > CASSANDRA-13162
> > > CASSANDRA-13547
> > > CASSANDRA-13127
> > > CASSANDRA-13409
> > > CASSANDRA-12952
> > > CASSANDRA-13069
> > > CASSANDRA-12888
> > >
> > > We're also keeping our eye on CASSANDRA-13657
> > >
> > > This is by no means an exhaustive list, but we're hoping it'll help
> take
> > > care of some of the more pressing / critical issues with the feature.
> > > Automated de-normalization on a Dynamo EC architecture is a Hard
> Problem.
> > >
> > >
> > > On Thu, Jul 20, 2017 at 9:56 PM, kurt greaves 
> > > wrote:
> > >
> > > > I'm going to do my best to review all the changes Zhao is making
> under
> > > > CASSANDRA-11500  jira/browse/CASSANDRA-11500
> > >,
> > > > but yeah definitely need a committer nominee as well. On that note,
> > Zhao
> > > is
> > > > going to try address a lot of the current issues I listed above in
> > > #11500.​
> > > > Thanks Zhao!
> > > >
> > >
> >
>

-- 


--





Re: State of Materialized Views

2017-07-24 Thread Josh McKenzie
>
> Who is "we" in this case?


Initial contributors (myself + Jake, Carl's no longer active on the
project), Zhao, Andres, Paulo, Sylvain, etc. The people who are publicly,
actively working on MV issues atm.

On Mon, Jul 24, 2017 at 9:46 AM, benjamin roth  wrote:

> Hi Josh,
>
> Who is "we" in this case?
>
> Best,
> Ben
>
> 2017-07-24 15:41 GMT+02:00 Josh McKenzie :
>
> > >
> > > The initial contributors turned their back on MVs
> >
> >
> > We're working on the following MV-related issues in the 4.0 time-frame:
> > CASSANDRA-13162
> > CASSANDRA-13547
> > CASSANDRA-13127
> > CASSANDRA-13409
> > CASSANDRA-12952
> > CASSANDRA-13069
> > CASSANDRA-12888
> >
> > We're also keeping our eye on CASSANDRA-13657
> >
> > This is by no means an exhaustive list, but we're hoping it'll help take
> > care of some of the more pressing / critical issues with the feature.
> > Automated de-normalization on a Dynamo EC architecture is a Hard Problem.
> >
> >
> > On Thu, Jul 20, 2017 at 9:56 PM, kurt greaves 
> > wrote:
> >
> > > I'm going to do my best to review all the changes Zhao is making under
> > > CASSANDRA-11500  >,
> > > but yeah definitely need a committer nominee as well. On that note,
> Zhao
> > is
> > > going to try address a lot of the current issues I listed above in
> > #11500.​
> > > Thanks Zhao!
> > >
> >
>


Re: State of Materialized Views

2017-07-24 Thread benjamin roth
Hi Josh,

Who is "we" in this case?

Best,
Ben

2017-07-24 15:41 GMT+02:00 Josh McKenzie :

> >
> > The initial contributors turned their back on MVs
>
>
> We're working on the following MV-related issues in the 4.0 time-frame:
> CASSANDRA-13162
> CASSANDRA-13547
> CASSANDRA-13127
> CASSANDRA-13409
> CASSANDRA-12952
> CASSANDRA-13069
> CASSANDRA-12888
>
> We're also keeping our eye on CASSANDRA-13657
>
> This is by no means an exhaustive list, but we're hoping it'll help take
> care of some of the more pressing / critical issues with the feature.
> Automated de-normalization on a Dynamo EC architecture is a Hard Problem.
>
>
> On Thu, Jul 20, 2017 at 9:56 PM, kurt greaves 
> wrote:
>
> > I'm going to do my best to review all the changes Zhao is making under
> > CASSANDRA-11500 ,
> > but yeah definitely need a committer nominee as well. On that note, Zhao
> is
> > going to try address a lot of the current issues I listed above in
> #11500.​
> > Thanks Zhao!
> >
>


Re: State of Materialized Views

2017-07-24 Thread Josh McKenzie
>
> The initial contributors turned their back on MVs


We're working on the following MV-related issues in the 4.0 time-frame:
CASSANDRA-13162
CASSANDRA-13547
CASSANDRA-13127
CASSANDRA-13409
CASSANDRA-12952
CASSANDRA-13069
CASSANDRA-12888

We're also keeping our eye on CASSANDRA-13657

This is by no means an exhaustive list, but we're hoping it'll help take
care of some of the more pressing / critical issues with the feature.
Automated de-normalization on a Dynamo EC architecture is a Hard Problem.


On Thu, Jul 20, 2017 at 9:56 PM, kurt greaves  wrote:

> I'm going to do my best to review all the changes Zhao is making under
> CASSANDRA-11500 ,
> but yeah definitely need a committer nominee as well. On that note, Zhao is
> going to try address a lot of the current issues I listed above in #11500.​
> Thanks Zhao!
>


Re: State of Materialized Views

2017-07-20 Thread kurt greaves
I'm going to do my best to review all the changes Zhao is making under
CASSANDRA-11500 ,
but yeah definitely need a committer nominee as well. On that note, Zhao is
going to try address a lot of the current issues I listed above in #11500.​
Thanks Zhao!


Re: State of Materialized Views

2017-07-20 Thread Nate McCall
>
>  so perhaps the real solution is we need to be more aggressive about 
> nominating and electing committers who are willing to spend some attention on 
> MVs.
>

I am very much +1 on this solution.

Huge thanks to Kurt for the excellent summarization and to Benjamin
and ZhaoYang for all their recent development efforts.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: State of Materialized Views

2017-07-19 Thread Jeff Jirsa


On 2017-07-16 21:22 (-0700), kurt greaves  wrote: 
> wall of text inc.
> *tl;dr: *Aiming to come to some conclusions about what we are doing with
> MV's and how we are going to make them stable in production. But really
> just trying to raise awareness/involvement for MV's.
> 

I share your frustration, for what it's worth. And Ben's, too. That doesn't 
necessarily count for much, I'm afraid, but I sympathize.

> It seems we've got an excess of MV bugs that pretty much make them
> completely unusable in production, or at least incredibly risky and also
> limited. It also appears that we don't have many people totally across MV's
> either (or at least a lack of people currently looking at them). To avoid
> us "forgetting" about MV's I'd like to raise the current issues and get
> opinions on the direction we should go with MV's. I know historically there
> was a lot of discussion about this, but it seems a lot of the originally
> involved are currently less involved, and thus before making wild changes
> to MV's it might be worth going back to the start and think through the
> original requirements and implementation.
> 
> 
> If anyone has been working on any of these tickets and no longer is able
> to, either update the ticket or let me know and I'll either take over/find
> some other poor soul to have a stab at it.
> It would also be nice to get some volunteers who are familiar with MV's to
> review the above tickets.

Anyone want to admit to running them in prod? Any committers with an MV install 
base? Any non-trivial use cases? 

> 
> 
> My general advice these days is for users to steer clear of MV's for the
> moment, however we have no clear plan for when these will really be stable.
> I think as some of the changes to fix MV's may potentially require a major
> version change, we should at least aim to get all those in for 4.0
> (although still need to figure out what exactly these issues are).
> Interested to hear peoples thoughts.

I think you're probably right on here. I think they may work for people with 
suitably simple use cases (append only, no delete, writes with strong 
consistency, and use single token or few tokens per node).

I think the more clear point is that we need people willing to help step up and 
fix it. I don't use them in prod, and I don't actually know anyone who does 
(though clearly a few folks do, including the three or four folks who seem to 
actually be working on the tickets), so perhaps the real solution is we need to 
be more aggressive about nominating and electing committers who are willing to 
spend some attention on MVs. 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: State of Materialized Views

2017-07-17 Thread kurt greaves
Thanks for the input Benjamin. Sounds like you've come to a lot of the same
conclusions I have. I'm certainly keen on fixing up MV's and I don't really
see a way we could avoid it, as I know they are already widely being used
in production. I think we would have had a much easier time if we went with
a basic implementation (append only) first, but y'know, hindsight.
Unfortunately I'd say we're kind of stuck with fixing what we've got or
have a really angry userbase that jumps ship.

*What I miss is a central consensus about "MV business rules" + a central
> set of proofs and or tests that support these rules and proof or falsify
> assumptions in a reproducible way.*
>
>From what I gathered from JIRA the goals in my original post are the ones
outlined during initial development of MV's. The general design and goals
were also documented here
,
however doesn't completely cover the current state of MV's.
I'm with you that we certainly need a set of proofs/tests to support these
rules. At the moment a lot of the open tickets have patches that contribute
good tests that cover many cases however we're almost kind of defining
rules as we go (granted it is difficult when we need to test every possible
write you could make in Cassandra).

In regards to your "tickler", a colleague has been working on something
similar however we haven't deemed it quite production ready yet so we
haven't released it to the public. It may be useful to compare notes if
you're interested!

​


Re: State of Materialized Views

2017-07-17 Thread benjamin roth
Hi Kurt,

First of all thanks for this elaborate post.

At this moment, I don't want to come up with a solution for all MV issues
but I would like to point out, why I was quite active some time ago and why
I pulled myself back.

As you also mentioned in different words, it seems to me that MVs are an
orphan in CS. They started out as a shiny and promising feature, but ... .
When I came to CS, MVs were one of the reasons why I gave CS in general and
3.0 in special a try. But when I started to work with MVs in production -
willing to overcome the "little obstacles" and the fact they are "not quite
stable" - I started to realize that there is almost no support from the
community. The initial contributors turned their back on MVs. All that
remained is a 95% ready feature, a lot of public documentation but no
disclaimer that says "Please Do Not Use MVs". And every time when a
discussion pops up around MVs the bottom line is:

- All or most of involved people have not much experience in MVs
- Original contributors are not involved
- It seems to me, discussions are more based on assumptions or superficial
knowledge than on real knowledge/experience/research/proofs
- Bringing in code changes is difficult for the same reasons. Nobody likes
to take over the "old heritage" or take over responsibility for it. And it
seems that nobody feels confident enough to bring in critical changes
- I don't want to touch this critical part in the code path, I know we have
tests but ...

Initially I was very eager to contribute and to help MV to get mature but
over time it turned out it is very cumbersome and frustrating. Additionally
I have very little time left in my daily routine to work on CS. So I
decided to work on a solution that solved our specific problems with CS and
MVs. I am not really happy with it but it actually works quite well.

To be honest, I also had in the back of my head to write a posing similar
to yours. I would really like to contribute and bring MVs forward, but not
at all costs. I see many problems with MVs, even some that haven't even
been mentioned, yet. But I do not want to come up with half-baked
assumptions. What really lacks for MVs is a reproducible code-based proof
what works and what does not. One example is the question "Why can I add
only a single column to an MV PK". I have read arguments of which I think
they are not quite right or "somehow incomplete". There are a lot of
arguments and discussions that are totally scattered across JIRA and it
seems to me that every contributor knows a little bit of this and a little
bit of that and remember this post or that post. I was already thinking of
setting up super-reduced "storage mock" to prove / find edge cases in MV
fail-and-repair scenarios to answer questions like these with code instead
of sentences like "I think that... " or "I can remember a comment of ...".
Unfortunately dtests are super painful things like that because a) they are
f* slow b) it is super complicated to simulate a certain situation. I
also did not see a simple way to do this with the CS unit test suite as I
didn't see a way to boot and control multiple storages there.

*What I miss is a central consensus about "MV business rules" + a central
set of proofs and or tests that support these rules and proof or falsify
assumptions in a reproducible way.*

The reason why I did not already come up with sth like that:
- Time
- Frustration

If I can see that there are more people who feel like that and are willing
to work together to find a solid solution, my level of frustration could
turn into motivation again.

--
Last but not least for those who care:
One of the solutions I created was to implement our own version of Tickler
(full table scans with CL_ALL to enforce read repair) to get rid of these
damned built-in repairs which simply don't work well (especially) for MVs.
To only name a few numbers:
- We could bring down the repair time of a KS with RF=5 from 5 hours to 5
minutes. Really. I could not believe it.
- No more "compaction storms" or piling up compaction queues or compactions
falling behind
- No more SSTables piling up. Before it was normal that the number of
SSTables went up from 300-400 to 5000 and more. After: No noticeable
change. (Btw that was the reason for CASSANDRA-12730. This isn't even bound
to MVs, they maybe only amplify the impact of the underlying design)
- We now repair the whole cluster in 16h (10 nodes, 400-450gb load each,
14KS). Before we had single keyspaces that took more than a day to finish.
Sometimes they took even 3 days with reaper because of "Too many
compactions"
- It showed us problems in our model. We had data that was not readable at
all due to massive tombstones + read timeouts
... if someone is interested in more details, just ping me.

- Benjamin


2017-07-17 6:22 GMT+02:00 kurt greaves :

> wall of text inc.
> *tl;dr: *Aiming to come to some conclusions about what we are doing with
> MV's and how we are going