I think this feature is important to the community and I don’t want to stifle that but if committers/contributors are working on the management process instead of testing 4.0 it takes away from it regardless of where the code lives. Waiting to merge until after 4.0, at a minimum, would benefit the testing effort.
Jordan On Sat, Sep 22, 2018 at 10:06 AM Sankalp Kohli <kohlisank...@gmail.com> wrote: > This is not part of core database and a separate repo and so my impression > is that this can continue to make progress. Also we can always make > progress and not merge it till freeze is lifted. > > Open to ideas/suggestions if someone thinks otherwise. > > > On Sep 22, 2018, at 03:13, kurt greaves <k...@instaclustr.com> wrote: > > > > Is this something we're moving ahead with despite the feature freeze? > > > > On Sat, 22 Sep 2018 at 08:32, dinesh.jo...@yahoo.com.INVALID > > <dinesh.jo...@yahoo.com.invalid> wrote: > > > >> I have created a sub-task - CASSANDRA-14783. Could we get some feedback > >> before we begin implementing anything? > >> > >> Dinesh > >> > >> On Thursday, September 20, 2018, 11:22:33 PM PDT, Dinesh Joshi < > >> dinesh.jo...@yahoo.com.INVALID> wrote: > >> > >> I have updated the doc with a short paragraph providing the > >> clarification. Sankalp's suggestion is already part of the doc. If there > >> aren't further objections could we move this discussion over to the jira > >> (CASSANDRA-14395)? > >> > >> Dinesh > >> > >>> On Sep 18, 2018, at 10:31 AM, sankalp kohli <kohlisank...@gmail.com> > >> wrote: > >>> > >>> How about we start with a few basic features in side car. How about > >> starting with this > >>> 1. Bulk nodetool commands: User can curl any sidecar and be able to run > >> a nodetool command in bulk across the cluster. > >>> > >> > <sidecar>:<port>/bulk/nodetool/tablestats?arg0=keyspace_name.table_name&arg1=<if > >> required> > >>> > >>> And later > >>> 2: Health checks. > >>> > >>> On Thu, Sep 13, 2018 at 11:34 AM dinesh.jo...@yahoo.com.INVALID < > >> dinesh.jo...@yahoo.com.invalid> wrote: > >>> I will update the document to add that point. The document did not mean > >> to serve as a design or architectural document but rather something that > >> would spark a discussion on the idea. > >>> Dinesh > >>> > >>> On Thursday, September 13, 2018, 10:59:34 AM PDT, Jonathan Haddad < > >> j...@jonhaddad.com <mailto:j...@jonhaddad.com>> wrote: > >>> > >>> Most of the discussion and work was done off the mailing list - there's > >> a > >>> big risk involved when folks disappear for months at a time and > resurface > >>> with big pile of code plus an agenda that you failed to loop everyone > in > >>> on. In addition, by your own words the design document didn't > accurately > >>> describe what was being built. I don't write this to try to argue > about > >>> it, I just want to put some perspective for those of us that weren't > part > >>> of this discussion on a weekly basis over the last several months. > Going > >>> forward let's keep things on the ML so we can avoid confusion and > >>> frustration for all parties. > >>> > >>> With that said - I think Blake made a really good point here and it's > >>> helped me understand the scope of what's being built better. Looking > at > >> it > >>> from a different perspective it doesn't seem like there's as much > overlap > >>> as I had initially thought. There's the machinery that runs certain > >> tasks > >>> (what Joey has been working on) and the user facing side of exposing > that > >>> information in management tool. > >>> > >>> I do appreciate (and like) the idea of not trying to boil the ocean, > and > >>> working on things incrementally. Putting a thin layer on top of > >> Cassandra > >>> that can perform cluster wide tasks does give us an opportunity to move > >> in > >>> the direction of a general purpose user-facing admin tool without > >>> committing to trying to write the full stack all at once (or even make > >>> decisions on it now). We do need a sensible way of doing rolling > >> restarts > >>> / scrubs / scheduling and Reaper wasn't built for that, and even though > >> we > >>> can add it I'm not sure if it's the best mechanism for the long term. > >>> > >>> So if your goal is to add maturity to the project by making cluster > wide > >>> tasks easier by providing a framework to build on top of, I'm in favor > of > >>> that and I don't see it as antithetical to what I had in mind with > >> Reaper. > >>> Rather, the two are more complementary than I had originally realized. > >>> > >>> Jon > >>> > >>> > >>> > >>> > >>> On Thu, Sep 13, 2018 at 10:39 AM dinesh.jo...@yahoo.com.INVALID > >>> <dinesh.jo...@yahoo.com <mailto:dinesh.jo...@yahoo.com>.invalid> > wrote: > >>> > >>>> I have a few clarifications - > >>>> The scope of the management process is not to simply run repair > >>>> scheduling. Repair scheduling is one of the many features we could > >>>> implement or adopt from existing sources. So could we please split the > >>>> Management Process discussion and the repair scheduling? > >>>> After re-reading the management process proposal, I see we missed to > >>>> communicate a basic idea in the document. We wanted to take a > pluggable > >>>> approach to various activities that the management process could > >> perform. > >>>> This could accommodate different implementations of common activities > >> such > >>>> as repair. The management process would provide the basic framework > >> and it > >>>> would have default implementations for some of the basic activities. > >> This > >>>> would allow for speedier iteration cycles and keep things extensible. > >>>> Turning to some questions that Jon and others have raised, when I +1, > >> my > >>>> intention is to fully contribute and stay with this community. That > >> said, > >>>> things feel rushed for some but for me it feels like analysis > >> paralysis. > >>>> We're looking for actionable feedback and to discuss the management > >> process > >>>> _not_ repair scheduling solutions. > >>>> Thanks, > >>>> Dinesh > >>>> > >>>> > >>>> > >>>> On Sep 12, 2018, at 6:24 PM, sankalp kohli <kohlisank...@gmail.com > >> <mailto:kohlisank...@gmail.com>> wrote: > >>>> Here is a list of open discussion points from the voting thread. I > >> think > >>>> some are already answered but I will still gather these questions > here. > >>>> > >>>> From several people: > >>>> 1. Vote is rushed and we need more time for discussion. > >>>> > >>>> From Sylvain > >>>> 2. About the voting process...I think that was addressed by Jeff Jirsa > >> and > >>>> deserves a separate thread as it is not directly related to this > >> thread. > >>>> 3. Does the project need a side car. > >>>> > >>>> From Jonathan Haddad > >>>> 4. Are people doing +1 willing to contribute > >>>> > >>>> From Jonathan Ellis > >>>> 5. List of feature set, maturity, maintainer availability from Reaper > >> or > >>>> any other project being donated. > >>>> > >>>> Mick Semb Wever > >>>> 6. We should not vote on these things and instead build consensus. > >>>> > >>>> Open Questions from this thread > >>>> 7. What technical debts we are talking about in Reaper. Can someone > >> give > >>>> concrete examples. > >>>> 8. What is the timeline of donating Reaper to Apache Cassandra. > >>>> > >>>> On Wed, Sep 12, 2018 at 3:49 PM sankalp kohli <kohlisank...@gmail.com > >> <mailto:kohlisank...@gmail.com>> > >>>> wrote: > >>>> > >>>> > >>>> (Using this thread and not the vote thread intentionally) > >>>> For folks talking about vote being rushed. I would use the email from > >>>> Joseph to show this is not rushed. There was no email on this thread > >> for 4 > >>>> months until I pinged. > >>>> > >>>> > >>>> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on > >> Reaper to > >>>> come up with design goals for a repair scheduler that could work at > >> Netflix > >>>> scale. > >>>> > >>>> ~Feb 2017: Netflix believes that the fundamental design gaps prevented > >> us > >>>> from using Reaper as it relies heavily on remote JMX connections and > >>>> central coordination. > >>>> > >>>> Sep. 2017: Vinay gives a lightning talk at NGCC about a highly > >> available > >>>> and distributed repair scheduling sidecar/tool. He is encouraged by > >>>> multiple committers to build repair scheduling into the daemon itself > >> and > >>>> not as a sidecar so the database is truly eventually consistent. > >>>> > >>>> ~Jun. 2017 - Feb. 2018: Based on internal need and the positive > >> feedback at > >>>> NGCC, Vinay and myself prototype the distributed repair scheduler > >> within > >>>> Priam and roll it out at Netflix scale. > >>>> > >>>> Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 > >> page > >>>> design document for adding repair scheduling to the daemon itself and > >> open > >>>> the design up for feedback from the community. We get feedback from > >> Alex, > >>>> Blake, Nate, Stefan, and Mick. As far as I know there were zero > >> proposals > >>>> to contribute Reaper at this point. We hear the consensus that the > >>>> community would prefer repair scheduling in a separate distributed > >> sidecar > >>>> rather than in the daemon itself and we re-work the design to match > >> this > >>>> consensus, re-aligning with our original proposal at NGCC. > >>>> > >>>> Apr 2018: Blake brings the discussion of repair scheduling to the dev > >> list > >>>> ( > >>>> > >>>> > >>>> > >> > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E > >>> > >>>> ). > >>>> Many community members give positive feedback that we should solve it > >> as > >>>> part of Cassandra and there is still no mention of contributing Reaper > >> at > >>>> this point. The last message is my attempted summary giving context on > >> how > >>>> we want to take the best of all the sidecars (OpsCenter, Priam, > >> Reaper) and > >>>> ship them with Cassandra. > >>>> > >>>> Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design > >> document > >>>> for gathering feedback on a general management sidecar. Sankalp and > >> Dinesh > >>>> encourage Vinay and myself to kickstart that sidecar using the repair > >>>> scheduler patch > >>>> > >>>> Apr 2018: Dinesh reaches out to the dev list ( > >>>> > >>>> > >>>> > >> > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > >> < > >> > https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E > >>> > >>>> ) > >>>> about the general management process to gain further feedback. All > >> feedback > >>>> remains positive as it is a potential place for multiple community > >> members > >>>> to contribute their various sidecar functionality. > >>>> > >>>> May-Jul 2017: Vinay and I work on creating a basic sidecar for running > >> the > >>>> repair scheduler based on the feedback from the community in > >>>> CASSANDRA-14346 and CASSANDRA-14395 > >>>> > >>>> Jun 2018: I bump CASSANDRA-14346 indicating we're still working on > >> this, > >>>> nobody objects > >>>> > >>>> Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras > >> anyone > >>>> needs review for before 4.0, I mention again that we've nearly got the > >>>> basic sidecar and repair scheduling work done and will need help with > >>>> review. No one responds. > >>>> > >>>> Aug 2018: We submit a patch that brings a basic distributed sidecar > and > >>>> robust distributed repair to Cassandra itself. Dinesh mentions that he > >> will > >>>> try to review. Now folks appear concerned about it being in tree and > >>>> instead maybe it should go in a different repo all together. I don't > >> think > >>>> we have consensus on the repo choice yet. > >>>> > >>>> On Sun, Sep 9, 2018 at 9:13 AM sankalp kohli <kohlisank...@gmail.com > >> <mailto:kohlisank...@gmail.com>> > >>>> wrote: > >>>> > >>>> > >>>> I agree with Jon and I think folks who are talking about tech debts in > >>>> Reaper should elaborate with examples about these tech debts. Can we > be > >>>> more precise and list them down? I see it spread out over this long > >> email > >>>> thread!! > >>>> > >>>> On Sun, Sep 9, 2018 at 6:29 AM Elliott Sims <elli...@backblaze.com > >> <mailto:elli...@backblaze.com>> > >>>> wrote: > >>>> > >>>> > >>>> A big one to add to your list there, IMO as a user: > >>>> * API for determining detailed repair state (and history?). > >> Essentially, > >>>> something beyond just "Is some sort of repair running?" so that tools > >>>> like > >>>> Reaper can parallelize better. > >>>> > >>>> On Sun, Sep 9, 2018 at 8:30 AM, Stefan Podkowinski <s...@apache.org > >> <mailto:s...@apache.org>> > >>>> wrote: > >>>> > >>>> > >>>> Does it have to be a single project with functionality provided by > >>>> multiple plugins? Designing a plugin API at this point seems to be a > >>>> > >>>> bit > >>>> > >>>> early and comes with additional complexity around managing plugins in > >>>> general. > >>>> > >>>> I was more thinking into the direction of: "what can we do to enable > >>>> people to create any kind of side car or tooling solution?". Thinks > >>>> > >>>> like: > >>>> > >>>> > >>>> Common cluster discovery and management API > >>>> * Detect local Cassandra processes > >>>> * Discover and receive events on cluster topology > >>>> * Get assigned tokens for nodes > >>>> * Read node configuration > >>>> * Health checks (as already proposed) > >>>> > >>>> Any side cars should be easy to install on nodes that already run > >>>> > >>>> Cassandra > >>>> > >>>> * Scripts for packaging (tar, deb, rpm) > >>>> * Templates for systemd support, optionally with auto-startup > >>>> > >>>> dependency > >>>> > >>>> on the Cassandra main process > >>>> > >>>> Integration testing > >>>> * Provide basic testing framework for mocking cluster state and > >>>> > >>>> messages > >>>> > >>>> > >>>> Support for other languages / avoid having to use JMX > >>>> * JMX bridge (HTTP? gRPC?, already implemented in #14346?) > >>>> > >>>> Obviously the whole side car discussion is not moving into a direction > >>>> everyone's happy with. Would it be an option to take a step back and > >>>> start implementing such a tooling framework with scripts and libraries > >>>> for the features described above, as a small GitHub project, instead > of > >>>> putting an existing side-car solution up for vote? If that would work > >>>> and we get people collaborating on code shared between existing > >>>> side-cars, then we could take the next step and think about either > >>>> revisit the "official Cassandra side-car" topic, or add the created > >>>> client tooling framework as official sub-project to the Cassandra > >>>> project (maybe via Apache incubator). > >>>> > >>>> > >>>> On 08.09.18 02:49, Joseph Lynch wrote: > >>>> > >>>> On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad <j...@jonhaddad.com > >> <mailto:j...@jonhaddad.com>> > >>>> > >>>> wrote: > >>>> > >>>> > >>>> We haven’t even defined any requirements for an admin tool. It’s > >>>> > >>>> > >>>> > >>>> hard to > >>>> > >>>> > >>>> > >>>> make a case for anything without agreement on what we’re trying to > >>>> > >>>> > >>>> build. > >>>> > >>>> > >>>> > >>>> > >>>> We were/are trying to sketch out scope/requirements in the #14395 and > >>>> #14346 tickets as well as their associated design documents. I think > >>>> the general proposed direction is a distributed 1:1 management > >>>> > >>>> > >>>> sidecar > >>>> > >>>> > >>>> process similar in architecture to Netflix's Priam except explicitly > >>>> built to be general and pluggable by anyone rather than tightly > >>>> coupled to AWS. > >>>> > >>>> Dinesh, Vinay and I were aiming for low amounts of scope at first and > >>>> take things in an iterative approach with just enough upfront design > >>>> but not so much we are unable to make any progress at all. For > >>>> > >>>> > >>>> example > >>>> > >>>> > >>>> maybe something like: > >>>> > >>>> 1. Get a super simple and non controversial sidecar process that > >>>> > >>>> > >>>> ships > >>>> > >>>> > >>>> with Cassandra and exposes a lightweight HTTP interface to e.g. some > >>>> basic JMX endpoints > >>>> 2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs > >>>> with the basic interfaces and state store and such > >>>> 2b. Start scoping and implementing the full HTTP interface, e.g. > >>>> backup status, cluster health status, etc ... > >>>> 3a. Start integrating implementations of the jobs from 2a such as > >>>> snapshot, backup, cluster restart, daemon + sstable upgrade, repair, > >>>> etc > >>>> 3b. Start integrating UI components that pair with the HTTP interface > >>>> > >>>> from 2b > >>>> > >>>> 4. ?? Perhaps start unlocking next generation operations like moving > >>>> "background" activities like compaction, streaming, repair etc into > >>>> one or more sidecar contained processes to ensure the main daemon > >>>> > >>>> > >>>> only > >>>> > >>>> > >>>> handles read+write requests > >>>> > >>>> There are going to be a lot of questions to answer, and I think > >>>> > >>>> > >>>> trying > >>>> > >>>> > >>>> to answer them all up front will mean that we get nowhere or make > >>>> unfortunate compromises that cripple the project from the start. If > >>>> people think we need to do more design and discussion than we have > >>>> been doing then we can spend more time on the design, but personally > >>>> I'd rather start iterating on code and prove value incrementally. If > >>>> it doesn't work out we won't release it GA to the community ... > >>>> > >>>> -Joey > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org <mailto: > >> dev-unsubscr...@cassandra.apache.org> > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> <mailto:dev-h...@cassandra.apache.org> > >>>> > >>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org <mailto: > >> dev-unsubscr...@cassandra.apache.org> > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> <mailto:dev-h...@cassandra.apache.org> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> -- > >>> Jon Haddad > >>> http://www.rustyrazorblade.com <http://www.rustyrazorblade.com/> > >>> twitter: rustyrazorblade > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >