Re: State of triggers

benjamin roth Sun, 05 Mar 2017 00:22:51 -0800

No. You just change the partitioner. That's all

Am 05.03.2017 09:15 schrieb "DuyHai Doan" <doanduy...@gmail.com>:


> "How can that be achieved? I haven't done "scientific researches" yet but I
> guess a "MV partitioner" could do the trick. Instead of applying the
> regular partitioner, an MV partitioner would calculate the PK of the base
> table (which is always possible) and then apply the regular partitioner."
>
> The main purpose of MV is to avoid the drawbacks of 2nd index architecture,
> e.g. to scan a lot of nodes to fetch the results.
>
> With MV, since you give the partition key, the guarantee is that you'll hit
> a single node.
>
> Now if you put MV data on the same node as base table data, you're doing
> more-or-less the same thing as 2nd index.
>
> Let's take a dead simple example
>
> CREATE TABLE user (user_id uuid PRIMARY KEY, email text);
> CREATE MV user_by_email AS SELECT * FROM user WHERE user_id IS NOT NULL AND
> email IS NOT NULL PRIMARY KEY((email),user_id);
>
> SELECT * FROM user_by_email WHERE email = xxx;
>
> With this query, how can you find the user_id that corresponds to email
> 'xxx' so that your MV partitioner idea can work ?
>
>
>
> On Sun, Mar 5, 2017 at 9:05 AM, benjamin roth <brs...@gmail.com> wrote:
>
> > While I was reading the MV paragraph in your post, an idea popped up:
> >
> > The problem with MV inconsistencies and inconsistent range movement is
> that
> > the "MV contract" is broken. This only happens because base data and
> > replica data reside on different hosts. If base data + replicas would
> stay
> > on the same host then a rebuild/remove would always stream both matching
> > parts of a base table + mv.
> >
> > So my idea:
> > Why not make a replica ALWAYS stay local regardless where the token of a
> MV
> > would point at. That would solve these problems:
> > 1. Rebuild / remove node would not break MV contract
> > 2. A write always stays local:
> >
> > a) That means replication happens sync. That means a quorum write to the
> > base table guarantees instant data availability with quorum read on a
> view
> >
> > b) It saves network roundtrips + request/response handling and helps to
> > keep a cluster healthier in case of bulk operations (like repair streams
> or
> > rebuild stream). Write load stays local and is not spread across the
> whole
> > cluster. I think it makes the load in these situations more predictable.
> >
> > How can that be achieved? I haven't done "scientific researches" yet but
> I
> > guess a "MV partitioner" could do the trick. Instead of applying the
> > regular partitioner, an MV partitioner would calculate the PK of the base
> > table (which is always possible) and then apply the regular partitioner.
> >
> > I'll create a proper Jira for it on monday. Currently it's sunday here
> and
> > my family wants me back so just a few thoughts on this right now.
> >
> > Any feedback is appreciated!
> >
> > 2017-03-05 6:34 GMT+01:00 Edward Capriolo <edlinuxg...@gmail.com>:
> >
> > > On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jji...@gmail.com> wrote:
> > >
> > > >
> > > >
> > > >
> > > > > On Mar 4, 2017, at 7:06 AM, Edward Capriolo <edlinuxg...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com>
> > wrote:
> > > > >>
> > > > >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <
> > > edlinuxg...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>>
> > > > >>> I used them. I built do it yourself secondary indexes with them.
> > They
> > > > >> have
> > > > >>> there gotchas, but so do all the secondary index implementations.
> > > Just
> > > > >>> because datastax does not write about something. Lets see like 5
> > > years
> > > > >> ago
> > > > >>> there was this: https://github.com/hmsonline/cassandra-triggers
> > > > >>>
> > > > >>>
> > > > >> Still in use? How'd it work? Production ready? Would you still do
> it
> > > > that
> > > > >> way in 2017?
> > > > >>
> > > > >>
> > > > >>> There is a fairly large divergence to what actual users do and
> what
> > > > other
> > > > >>> groups 'say' actual users do in some cases.
> > > > >>>
> > > > >>
> > > > >> A lot of people don't share what they're doing (for business
> > reasons,
> > > or
> > > > >> because they don't think it's important, or because they don't
> know
> > > > >> how/where), and that's fine but it makes it hard for anyone to
> know
> > > what
> > > > >> features are used, or how well they're really working in
> production.
> > > > >>
> > > > >> I've seen a handful of "how do we use triggers" questions in IRC,
> > and
> > > > they
> > > > >> weren't unreasonable questions, but seemed like a lot of pain, and
> > > more
> > > > >> than one of those people ultimately came back and said they used
> > some
> > > > other
> > > > >> mechanism (and of course, some of them silently disappear, so we
> > have
> > > no
> > > > >> idea if it worked or not).
> > > > >>
> > > > >> If anyone's actively using triggers, please don't keep it a
> secret.
> > > > Knowing
> > > > >> that they're being used would be a great way to justify continuing
> > to
> > > > >> maintain them.
> > > > >>
> > > > >> - Jeff
> > > > >>
> > > > >
> > > > > "Still in use? How'd it work? Production ready? Would you still do
> it
> > > > that way in 2017?"
> > > > >
> > > > > I mean that is a loaded question. How long has cassandra had
> > Secondary
> > > > > Indexes? Did they work well? Would you use them? How many times
> were
> > > > they re-written?
> > > >
> > > > It wasn't really meant to be a loaded question; I was being sincere
> > > >
> > > > But I'll answer: secondary indexes suck for many use cases, but
> they're
> > > > invaluable for their actual intended purpose, and I have no idea how
> > many
> > > > times they've been rewritten but they're production ready for their
> > > narrow
> > > > use case (defined by cardinality).
> > > >
> > > > Is there a real triggers use case still? Alternative to MVs?
> > Alternative
> > > > to CDC? I've never implemented triggers - since you have, what's the
> > > level
> > > > of surprise for the developer?
> > >
> > >
> > > :) You mention alternatives/: Lets break them down.
> > >
> > > MV:
> > > They seem to have a lot pf promise. IE you can use them for things
> other
> > > then equality searches, and I do think the CQL example with the top N
> > high
> > > scores is pretty useful. Then again our buddy Mr Roth has a thread
> named
> > > "Rebuild / remove node with MV is inconsistent". I actually think a lot
> > of
> > > the use case for mv falls into the category of "something you should
> > > actually be doing with storm". I can vibe with the concept of not
> > needing a
> > > streaming platform, but i KNOW storm would do this correctly. I don't
> > want
> > > to land on something like 2x index v1 v2 where there was fundamental
> > flaws
> > > at scale.(not saying this is case but the rebuild thing seems a bit
> > scary)
> > >
> > > CDC:
> > > I slightly afraid of this. Rational: A extensible piece design
> > specifically
> > > for a close source implementation of hub and spoke replication. I have
> > some
> > > experience trying to "play along" with extensible things
> > > https://issues.apache.org/jira/browse/CASSANDRA-12627
> > > "Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}."
> > >
> > > Not a rub, but I can't even get something committed using an existing
> > > extensible interface. Heaven forbid a use case I have would want to
> > > *change*
> > > the interface, I would probably get a -12. So I have no desire to try
> and
> > > maintain a CDC implementation. I see myself falling into the same old
> > "why
> > > you want to do this? -1" trap.
> > >
> > > Coordinator Triggers:
> > > To bring things back really old-school coordinator triggers everyone
> > always
> > > wanted. In a nutshell, I DO believe they are easier to reason about
> then
> > > MV. It is pretty basic, it happens on the coordinator there is no
> > batchlogs
> > > or whatever, best effort possibly requiring more nodes then as the keys
> > > might be on different services. Actually I tend do like features like.
> > Once
> > > something comes on the downswing of  "software hype cycle" you know it
> is
> > > pretty stable as everyone's all excited about other things.
> > >
> > > As I said, I know I can use storm for top-n, so what is this feature?
> > Well
> > > I want to optimize my network transfer generally by building my batch
> > > mutations on the server. Seems reasonable. Maybe I want to have my own
> > > little "read before write" thing like CQL lists.
> > >
> > > The warts, having tried it. First time i tried it found it did not work
> > > with non batches, patched in 3 hours. Took weeks before some CQL user
> had
> > > the same problem and it got fixed :) There was no dynamic stuff at the
> > time
> > > so it was BYO class loader. Going against the grain and saying.
> > >
> > > The thing you have to realize with the best effort coordinator triggers
> > are
> > > that "transaction" could be incomplete and well that sucks maybe for
> some
> > > cases. But I actually felt the 2x index implementations force all
> > problems
> > > into a type of "foreign key transnational integrity " that does not
> make
> > > sense for cassandra.
> > >
> > > Have you every used elastic search, there version of consistency is
> write
> > > something, keep reading and eventually you see it, wildly popular :) It
> > is
> > > a crazy world.
> > >
> >
>

Re: State of triggers

Reply via email to