No. You just change the partitioner. That's all Am 05.03.2017 09:15 schrieb "DuyHai Doan" <doanduy...@gmail.com>:
> "How can that be achieved? I haven't done "scientific researches" yet but I > guess a "MV partitioner" could do the trick. Instead of applying the > regular partitioner, an MV partitioner would calculate the PK of the base > table (which is always possible) and then apply the regular partitioner." > > The main purpose of MV is to avoid the drawbacks of 2nd index architecture, > e.g. to scan a lot of nodes to fetch the results. > > With MV, since you give the partition key, the guarantee is that you'll hit > a single node. > > Now if you put MV data on the same node as base table data, you're doing > more-or-less the same thing as 2nd index. > > Let's take a dead simple example > > CREATE TABLE user (user_id uuid PRIMARY KEY, email text); > CREATE MV user_by_email AS SELECT * FROM user WHERE user_id IS NOT NULL AND > email IS NOT NULL PRIMARY KEY((email),user_id); > > SELECT * FROM user_by_email WHERE email = xxx; > > With this query, how can you find the user_id that corresponds to email > 'xxx' so that your MV partitioner idea can work ? > > > > On Sun, Mar 5, 2017 at 9:05 AM, benjamin roth <brs...@gmail.com> wrote: > > > While I was reading the MV paragraph in your post, an idea popped up: > > > > The problem with MV inconsistencies and inconsistent range movement is > that > > the "MV contract" is broken. This only happens because base data and > > replica data reside on different hosts. If base data + replicas would > stay > > on the same host then a rebuild/remove would always stream both matching > > parts of a base table + mv. > > > > So my idea: > > Why not make a replica ALWAYS stay local regardless where the token of a > MV > > would point at. That would solve these problems: > > 1. Rebuild / remove node would not break MV contract > > 2. A write always stays local: > > > > a) That means replication happens sync. That means a quorum write to the > > base table guarantees instant data availability with quorum read on a > view > > > > b) It saves network roundtrips + request/response handling and helps to > > keep a cluster healthier in case of bulk operations (like repair streams > or > > rebuild stream). Write load stays local and is not spread across the > whole > > cluster. I think it makes the load in these situations more predictable. > > > > How can that be achieved? I haven't done "scientific researches" yet but > I > > guess a "MV partitioner" could do the trick. Instead of applying the > > regular partitioner, an MV partitioner would calculate the PK of the base > > table (which is always possible) and then apply the regular partitioner. > > > > I'll create a proper Jira for it on monday. Currently it's sunday here > and > > my family wants me back so just a few thoughts on this right now. > > > > Any feedback is appreciated! > > > > 2017-03-05 6:34 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>: > > > > > On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jji...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mar 4, 2017, at 7:06 AM, Edward Capriolo <edlinuxg...@gmail.com > > > > > > wrote: > > > > > > > > > >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com> > > wrote: > > > > >> > > > > >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo < > > > edlinuxg...@gmail.com> > > > > >> wrote: > > > > >> > > > > >>> > > > > >>> I used them. I built do it yourself secondary indexes with them. > > They > > > > >> have > > > > >>> there gotchas, but so do all the secondary index implementations. > > > Just > > > > >>> because datastax does not write about something. Lets see like 5 > > > years > > > > >> ago > > > > >>> there was this: https://github.com/hmsonline/cassandra-triggers > > > > >>> > > > > >>> > > > > >> Still in use? How'd it work? Production ready? Would you still do > it > > > > that > > > > >> way in 2017? > > > > >> > > > > >> > > > > >>> There is a fairly large divergence to what actual users do and > what > > > > other > > > > >>> groups 'say' actual users do in some cases. > > > > >>> > > > > >> > > > > >> A lot of people don't share what they're doing (for business > > reasons, > > > or > > > > >> because they don't think it's important, or because they don't > know > > > > >> how/where), and that's fine but it makes it hard for anyone to > know > > > what > > > > >> features are used, or how well they're really working in > production. > > > > >> > > > > >> I've seen a handful of "how do we use triggers" questions in IRC, > > and > > > > they > > > > >> weren't unreasonable questions, but seemed like a lot of pain, and > > > more > > > > >> than one of those people ultimately came back and said they used > > some > > > > other > > > > >> mechanism (and of course, some of them silently disappear, so we > > have > > > no > > > > >> idea if it worked or not). > > > > >> > > > > >> If anyone's actively using triggers, please don't keep it a > secret. > > > > Knowing > > > > >> that they're being used would be a great way to justify continuing > > to > > > > >> maintain them. > > > > >> > > > > >> - Jeff > > > > >> > > > > > > > > > > "Still in use? How'd it work? Production ready? Would you still do > it > > > > that way in 2017?" > > > > > > > > > > I mean that is a loaded question. How long has cassandra had > > Secondary > > > > > Indexes? Did they work well? Would you use them? How many times > were > > > > they re-written? > > > > > > > > It wasn't really meant to be a loaded question; I was being sincere > > > > > > > > But I'll answer: secondary indexes suck for many use cases, but > they're > > > > invaluable for their actual intended purpose, and I have no idea how > > many > > > > times they've been rewritten but they're production ready for their > > > narrow > > > > use case (defined by cardinality). > > > > > > > > Is there a real triggers use case still? Alternative to MVs? > > Alternative > > > > to CDC? I've never implemented triggers - since you have, what's the > > > level > > > > of surprise for the developer? > > > > > > > > > :) You mention alternatives/: Lets break them down. > > > > > > MV: > > > They seem to have a lot pf promise. IE you can use them for things > other > > > then equality searches, and I do think the CQL example with the top N > > high > > > scores is pretty useful. Then again our buddy Mr Roth has a thread > named > > > "Rebuild / remove node with MV is inconsistent". I actually think a lot > > of > > > the use case for mv falls into the category of "something you should > > > actually be doing with storm". I can vibe with the concept of not > > needing a > > > streaming platform, but i KNOW storm would do this correctly. I don't > > want > > > to land on something like 2x index v1 v2 where there was fundamental > > flaws > > > at scale.(not saying this is case but the rebuild thing seems a bit > > scary) > > > > > > CDC: > > > I slightly afraid of this. Rational: A extensible piece design > > specifically > > > for a close source implementation of hub and spoke replication. I have > > some > > > experience trying to "play along" with extensible things > > > https://issues.apache.org/jira/browse/CASSANDRA-12627 > > > "Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}." > > > > > > Not a rub, but I can't even get something committed using an existing > > > extensible interface. Heaven forbid a use case I have would want to > > > *change* > > > the interface, I would probably get a -12. So I have no desire to try > and > > > maintain a CDC implementation. I see myself falling into the same old > > "why > > > you want to do this? -1" trap. > > > > > > Coordinator Triggers: > > > To bring things back really old-school coordinator triggers everyone > > always > > > wanted. In a nutshell, I DO believe they are easier to reason about > then > > > MV. It is pretty basic, it happens on the coordinator there is no > > batchlogs > > > or whatever, best effort possibly requiring more nodes then as the keys > > > might be on different services. Actually I tend do like features like. > > Once > > > something comes on the downswing of "software hype cycle" you know it > is > > > pretty stable as everyone's all excited about other things. > > > > > > As I said, I know I can use storm for top-n, so what is this feature? > > Well > > > I want to optimize my network transfer generally by building my batch > > > mutations on the server. Seems reasonable. Maybe I want to have my own > > > little "read before write" thing like CQL lists. > > > > > > The warts, having tried it. First time i tried it found it did not work > > > with non batches, patched in 3 hours. Took weeks before some CQL user > had > > > the same problem and it got fixed :) There was no dynamic stuff at the > > time > > > so it was BYO class loader. Going against the grain and saying. > > > > > > The thing you have to realize with the best effort coordinator triggers > > are > > > that "transaction" could be incomplete and well that sucks maybe for > some > > > cases. But I actually felt the 2x index implementations force all > > problems > > > into a type of "foreign key transnational integrity " that does not > make > > > sense for cassandra. > > > > > > Have you every used elastic search, there version of consistency is > write > > > something, keep reading and eventually you see it, wildly popular :) It > > is > > > a crazy world. > > > > > >