This is a good initiative. We have advocated for and run a sidecar for the past 5+ years, and we've learned and improved it a lot. We look forward to moving features from Priam (such as backup, HTTP -> JMX, etc) incrementally to this sidecar as they make sense.
Thanks, Vinay Chella On Fri, Apr 13, 2018 at 7:01 AM, Eric Evans <john.eric.ev...@gmail.com> wrote: > On Thu, Apr 12, 2018 at 4:41 PM, Dinesh Joshi > <dinesh.jo...@yahoo.com.invalid> wrote: > > Hey all - > > With the uptick in discussion around Cassandra operability and after > discussing potential solutions with various members of the community, we > would like to propose the addition of a management process/sub-project into > Apache Cassandra. The process would be responsible for common operational > tasks like bulk execution of nodetool commands, backup/restore, and health > checks, among others. We feel we have a proposal that will garner some > discussion and debate but is likely to reach consensus. > > While the community, in large part, agrees that these features should > exist “in the database”, there is debate on how they should be implemented. > Primarily, whether or not to use an external process or build on > CassandraDaemon. This is an important architectural decision but we feel > the most critical aspect is not where the code runs but that the operator > still interacts with the notion of a single database. Multi-process > databases are as old as Postgres and continue to be common in newer systems > like Druid. As such, we propose a separate management process for the > following reasons: > > > > - Resource isolation & Safety: Features in the management process > will not affect C*'s read/write path which is critical for stability. An > isolated process has several technical advantages including preventing use > of unnecessary dependencies in CassandraDaemon, separation of JVM resources > like thread pools and heap, and preventing bugs from adversely affecting > the main process. In particular, GC tuning can be done separately for the > two processes, hopefully helping to improve, or at least not adversely > affect, tail latencies of the main process. > > > > - Health Checks & Recovery: Currently users implement health checks > in their own sidecar process. Implementing them in the serving process does > not make sense because if the JVM running the CassandraDaemon goes south, > the healthchecks and potentially any recovery code may not be able to run. > Having a management process running in isolation opens up the possibility > to not only report the health of the C* process such as long GC pauses or > stuck JVM but also to recover from it. Having a list of basic health checks > that are tested with every C* release and officially supported will help > boost confidence in C* quality and make it easier to operate. > > > > - Reduced Risk: By having a separate Daemon we open the possibility > to contribute features that otherwise would not have been considered before > eg. a UI. A library that started many background threads and is operated > completely differently would likely be considered too risky for > CassandraDaemon but is a good candidate for the management process. > > Makes sense IMO. > > > What can go into the management process? > > - Features that are non-essential for serving reads & writes for eg. > Backup/Restore or Running Health Checks against the CassandraDaemon, etc. > > > > - Features that do not make the management process critical for > functioning of the serving process. In other words, if someone does not > wish to use this management process, they are free to disable it. > > > > We would like to initially build minimal set of features such as health > checks and bulk commands into the first iteration of the management > process. We would use the same software stack that is used to build the > current CassandraDaemon binary. This would be critical for sharing code > between CassandraDaemon & management processes. The code should live > in-tree to make this easy. > > With regards to more in-depth features like repair scheduling and > discussions around compaction in or out of CassandraDaemon, while the > management process may be a suitable host, it is not our goal to decide > that at this time. The management process could be used in these cases, as > they meet the criteria above, but other technical/architectural reasons may > exists for why it should not be. > > We are looking forward to your comments on our proposal, > > Sounds good to me. > > Personally, I'm a little less interested in things like > health/availability checks and metrics collection, because there are > already tools to solve this problem (and most places will already be > using them). I'm more interested in things like cluster status, > streaming, repair, etc. Something to automate/centralize > database-specific command and control, and improve visibility. > > In-tree also makes sense (tools/ maybe?), but I would suggest working > out of a branch initially, and seeking inclusion when there is > something more concrete to discuss. > > > -- > Eric Evans > john.eric.ev...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >