Is this something we're moving ahead with despite the feature freeze? On Sat, 22 Sep 2018 at 08:32, dinesh.jo...@yahoo.com.INVALID <dinesh.jo...@yahoo.com.invalid> wrote:
> I have created a sub-task - CASSANDRA-14783. Could we get some feedback > before we begin implementing anything? > > Dinesh > > On Thursday, September 20, 2018, 11:22:33 PM PDT, Dinesh Joshi < > dinesh.jo...@yahoo.com.INVALID> wrote: > > I have updated the doc with a short paragraph providing the > clarification. Sankalp's suggestion is already part of the doc. If there > aren't further objections could we move this discussion over to the jira > (CASSANDRA-14395)? > > Dinesh > > > On Sep 18, 2018, at 10:31 AM, sankalp kohli <kohlisank...@gmail.com> > wrote: > > > > How about we start with a few basic features in side car. How about > starting with this > > 1. Bulk nodetool commands: User can curl any sidecar and be able to run > a nodetool command in bulk across the cluster. > > > <sidecar>:<port>/bulk/nodetool/tablestats?arg0=keyspace_name.table_name&arg1=<if > required> > > > > And later > > 2: Health checks. > > > > On Thu, Sep 13, 2018 at 11:34 AM dinesh.jo...@yahoo.com.INVALID < > dinesh.jo...@yahoo.com.invalid> wrote: > > I will update the document to add that point. The document did not mean > to serve as a design or architectural document but rather something that > would spark a discussion on the idea. > > Dinesh > > > > On Thursday, September 13, 2018, 10:59:34 AM PDT, Jonathan Haddad < > j...@jonhaddad.com <mailto:j...@jonhaddad.com>> wrote: > > > > Most of the discussion and work was done off the mailing list - there's > a > > big risk involved when folks disappear for months at a time and resurface > > with big pile of code plus an agenda that you failed to loop everyone in > > on. In addition, by your own words the design document didn't accurately > > describe what was being built. I don't write this to try to argue about > > it, I just want to put some perspective for those of us that weren't part > > of this discussion on a weekly basis over the last several months. Going > > forward let's keep things on the ML so we can avoid confusion and > > frustration for all parties. > > > > With that said - I think Blake made a really good point here and it's > > helped me understand the scope of what's being built better. Looking at > it > > from a different perspective it doesn't seem like there's as much overlap > > as I had initially thought. There's the machinery that runs certain > tasks > > (what Joey has been working on) and the user facing side of exposing that > > information in management tool. > > > > I do appreciate (and like) the idea of not trying to boil the ocean, and > > working on things incrementally. Putting a thin layer on top of > Cassandra > > that can perform cluster wide tasks does give us an opportunity to move > in > > the direction of a general purpose user-facing admin tool without > > committing to trying to write the full stack all at once (or even make > > decisions on it now). We do need a sensible way of doing rolling > restarts > > / scrubs / scheduling and Reaper wasn't built for that, and even though > we > > can add it I'm not sure if it's the best mechanism for the long term. > > > > So if your goal is to add maturity to the project by making cluster wide > > tasks easier by providing a framework to build on top of, I'm in favor of > > that and I don't see it as antithetical to what I had in mind with > Reaper. > > Rather, the two are more complementary than I had originally realized. > > > > Jon > > > > > > > > > > On Thu, Sep 13, 2018 at 10:39 AM dinesh.jo...@yahoo.com.INVALID > > <dinesh.jo...@yahoo.com <mailto:dinesh.jo...@yahoo.com>.invalid> wrote: > > > > > I have a few clarifications - > > > The scope of the management process is not to simply run repair > > > scheduling. Repair scheduling is one of the many features we could > > > implement or adopt from existing sources. So could we please split the > > > Management Process discussion and the repair scheduling? > > > After re-reading the management process proposal, I see we missed to > > > communicate a basic idea in the document. We wanted to take a pluggable > > > approach to various activities that the management process could > perform. > > > This could accommodate different implementations of common activities > such > > > as repair. The management process would provide the basic framework > and it > > > would have default implementations for some of the basic activities. > This > > > would allow for speedier iteration cycles and keep things extensible. > > > Turning to some questions that Jon and others have raised, when I +1, > my > > > intention is to fully contribute and stay with this community. That > said, > > > things feel rushed for some but for me it feels like analysis > paralysis. > > > We're looking for actionable feedback and to discuss the management > process > > > _not_ repair scheduling solutions. > > > Thanks, > > > Dinesh > > > > > > > > > > > > On Sep 12, 2018, at 6:24 PM, sankalp kohli <kohlisank...@gmail.com > <mailto:kohlisank...@gmail.com>> wrote: > > > Here is a list of open discussion points from the voting thread. I > think > > > some are already answered but I will still gather these questions here. > > > > > > From several people: > > > 1. Vote is rushed and we need more time for discussion. > > > > > > From Sylvain > > > 2. About the voting process...I think that was addressed by Jeff Jirsa > and > > > deserves a separate thread as it is not directly related to this > thread. > > > 3. Does the project need a side car. > > > > > > From Jonathan Haddad > > > 4. Are people doing +1 willing to contribute > > > > > > From Jonathan Ellis > > > 5. List of feature set, maturity, maintainer availability from Reaper > or > > > any other project being donated. > > > > > > Mick Semb Wever > > > 6. We should not vote on these things and instead build consensus. > > > > > > Open Questions from this thread > > > 7. What technical debts we are talking about in Reaper. Can someone > give > > > concrete examples. > > > 8. What is the timeline of donating Reaper to Apache Cassandra. > > > > > > On Wed, Sep 12, 2018 at 3:49 PM sankalp kohli <kohlisank...@gmail.com > <mailto:kohlisank...@gmail.com>> > > > wrote: > > > > > > > > > (Using this thread and not the vote thread intentionally) > > > For folks talking about vote being rushed. I would use the email from > > > Joseph to show this is not rushed. There was no email on this thread > for 4 > > > months until I pinged. > > > > > > > > > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on > Reaper to > > > come up with design goals for a repair scheduler that could work at > Netflix > > > scale. > > > > > > ~Feb 2017: Netflix believes that the fundamental design gaps prevented > us > > > from using Reaper as it relies heavily on remote JMX connections and > > > central coordination. > > > > > > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly > available > > > and distributed repair scheduling sidecar/tool. He is encouraged by > > > multiple committers to build repair scheduling into the daemon itself > and > > > not as a sidecar so the database is truly eventually consistent. > > > > > > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive > feedback at > > > NGCC, Vinay and myself prototype the distributed repair scheduler > within > > > Priam and roll it out at Netflix scale. > > > > > > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 > page > > > design document for adding repair scheduling to the daemon itself and > open > > > the design up for feedback from the community. We get feedback from > Alex, > > > Blake, Nate, Stefan, and Mick. As far as I know there were zero > proposals > > > to contribute Reaper at this point. We hear the consensus that the > > > community would prefer repair scheduling in a separate distributed > sidecar > > > rather than in the daemon itself and we re-work the design to match > this > > > consensus, re-aligning with our original proposal at NGCC. > > > > > > Apr 2018: Blake brings the discussion of repair scheduling to the dev > list > > > ( > > > > > > > > > > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > < > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > > > > > ). > > > Many community members give positive feedback that we should solve it > as > > > part of Cassandra and there is still no mention of contributing Reaper > at > > > this point. The last message is my attempted summary giving context on > how > > > we want to take the best of all the sidecars (OpsCenter, Priam, > Reaper) and > > > ship them with Cassandra. > > > > > > Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design > document > > > for gathering feedback on a general management sidecar. Sankalp and > Dinesh > > > encourage Vinay and myself to kickstart that sidecar using the repair > > > scheduler patch > > > > > > Apr 2018: Dinesh reaches out to the dev list ( > > > > > > > > > > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > < > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > > > > > ) > > > about the general management process to gain further feedback. All > feedback > > > remains positive as it is a potential place for multiple community > members > > > to contribute their various sidecar functionality. > > > > > > May-Jul 2017: Vinay and I work on creating a basic sidecar for running > the > > > repair scheduler based on the feedback from the community in > > > CASSANDRA-14346 and CASSANDRA-14395 > > > > > > Jun 2018: I bump CASSANDRA-14346 indicating we're still working on > this, > > > nobody objects > > > > > > Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras > anyone > > > needs review for before 4.0, I mention again that we've nearly got the > > > basic sidecar and repair scheduling work done and will need help with > > > review. No one responds. > > > > > > Aug 2018: We submit a patch that brings a basic distributed sidecar and > > > robust distributed repair to Cassandra itself. Dinesh mentions that he > will > > > try to review. Now folks appear concerned about it being in tree and > > > instead maybe it should go in a different repo all together. I don't > think > > > we have consensus on the repo choice yet. > > > > > > On Sun, Sep 9, 2018 at 9:13 AM sankalp kohli <kohlisank...@gmail.com > <mailto:kohlisank...@gmail.com>> > > > wrote: > > > > > > > > > I agree with Jon and I think folks who are talking about tech debts in > > > Reaper should elaborate with examples about these tech debts. Can we be > > > more precise and list them down? I see it spread out over this long > email > > > thread!! > > > > > > On Sun, Sep 9, 2018 at 6:29 AM Elliott Sims <elli...@backblaze.com > <mailto:elli...@backblaze.com>> > > > wrote: > > > > > > > > > A big one to add to your list there, IMO as a user: > > > * API for determining detailed repair state (and history?). > Essentially, > > > something beyond just "Is some sort of repair running?" so that tools > > > like > > > Reaper can parallelize better. > > > > > > On Sun, Sep 9, 2018 at 8:30 AM, Stefan Podkowinski <s...@apache.org > <mailto:s...@apache.org>> > > > wrote: > > > > > > > > > Does it have to be a single project with functionality provided by > > > multiple plugins? Designing a plugin API at this point seems to be a > > > > > > bit > > > > > > early and comes with additional complexity around managing plugins in > > > general. > > > > > > I was more thinking into the direction of: "what can we do to enable > > > people to create any kind of side car or tooling solution?". Thinks > > > > > > like: > > > > > > > > > Common cluster discovery and management API > > > * Detect local Cassandra processes > > > * Discover and receive events on cluster topology > > > * Get assigned tokens for nodes > > > * Read node configuration > > > * Health checks (as already proposed) > > > > > > Any side cars should be easy to install on nodes that already run > > > > > > Cassandra > > > > > > * Scripts for packaging (tar, deb, rpm) > > > * Templates for systemd support, optionally with auto-startup > > > > > > dependency > > > > > > on the Cassandra main process > > > > > > Integration testing > > > * Provide basic testing framework for mocking cluster state and > > > > > > messages > > > > > > > > > Support for other languages / avoid having to use JMX > > > * JMX bridge (HTTP? gRPC?, already implemented in #14346?) > > > > > > Obviously the whole side car discussion is not moving into a direction > > > everyone's happy with. Would it be an option to take a step back and > > > start implementing such a tooling framework with scripts and libraries > > > for the features described above, as a small GitHub project, instead of > > > putting an existing side-car solution up for vote? If that would work > > > and we get people collaborating on code shared between existing > > > side-cars, then we could take the next step and think about either > > > revisit the "official Cassandra side-car" topic, or add the created > > > client tooling framework as official sub-project to the Cassandra > > > project (maybe via Apache incubator). > > > > > > > > > On 08.09.18 02:49, Joseph Lynch wrote: > > > > > > On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad <j...@jonhaddad.com > <mailto:j...@jonhaddad.com>> > > > > > > wrote: > > > > > > > > > We haven’t even defined any requirements for an admin tool. It’s > > > > > > > > > > > > hard to > > > > > > > > > > > > make a case for anything without agreement on what we’re trying to > > > > > > > > > build. > > > > > > > > > > > > > > > We were/are trying to sketch out scope/requirements in the #14395 and > > > #14346 tickets as well as their associated design documents. I think > > > the general proposed direction is a distributed 1:1 management > > > > > > > > > sidecar > > > > > > > > > process similar in architecture to Netflix's Priam except explicitly > > > built to be general and pluggable by anyone rather than tightly > > > coupled to AWS. > > > > > > Dinesh, Vinay and I were aiming for low amounts of scope at first and > > > take things in an iterative approach with just enough upfront design > > > but not so much we are unable to make any progress at all. For > > > > > > > > > example > > > > > > > > > maybe something like: > > > > > > 1. Get a super simple and non controversial sidecar process that > > > > > > > > > ships > > > > > > > > > with Cassandra and exposes a lightweight HTTP interface to e.g. some > > > basic JMX endpoints > > > 2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs > > > with the basic interfaces and state store and such > > > 2b. Start scoping and implementing the full HTTP interface, e.g. > > > backup status, cluster health status, etc ... > > > 3a. Start integrating implementations of the jobs from 2a such as > > > snapshot, backup, cluster restart, daemon + sstable upgrade, repair, > > > etc > > > 3b. Start integrating UI components that pair with the HTTP interface > > > > > > from 2b > > > > > > 4. ?? Perhaps start unlocking next generation operations like moving > > > "background" activities like compaction, streaming, repair etc into > > > one or more sidecar contained processes to ensure the main daemon > > > > > > > > > only > > > > > > > > > handles read+write requests > > > > > > There are going to be a lot of questions to answer, and I think > > > > > > > > > trying > > > > > > > > > to answer them all up front will mean that we get nowhere or make > > > unfortunate compromises that cripple the project from the start. If > > > people think we need to do more design and discussion than we have > > > been doing then we can spend more time on the design, but personally > > > I'd rather start iterating on code and prove value incrementally. If > > > it doesn't work out we won't release it GA to the community ... > > > > > > -Joey > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org <mailto: > dev-unsubscr...@cassandra.apache.org> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > <mailto:dev-h...@cassandra.apache.org> > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org <mailto: > dev-unsubscr...@cassandra.apache.org> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > <mailto:dev-h...@cassandra.apache.org> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Jon Haddad > > http://www.rustyrazorblade.com <http://www.rustyrazorblade.com/> > > twitter: rustyrazorblade >