>> Given the sidecar is running on the same node as the main C* process, the >> only real resource isolation you have is in heap/GC? CPU/Memory/IO are all >> still shared between the main C* process and the side car, and coordinating >> those across processes is harder than coordinating them within a single >> process. For example if we wanted to have the compaction throughput, >> streaming throughput, and analytics read throughput all tied back to a >> single disk IO cap, that is harder with an external process. > > Relatively trivial, for CPU and memory, to run them in different > containers/cgroups/etc, so you can put an exact cpu/memory limit on the > sidecar. That's different from a jmx rate limiter/throttle, but (arguably) > more precise, because it actually limits the underlying physical resource > instead of a proxy for it in a config setting. >
If we want to bring groups/containers/etc into the default deployment mechanisms of C*, great. I am all for dividing it up into micro services given we solve all the problems I listed in the complexity section. I am actually all for dividing C* up into multiple micro services, but the project needs to buy in to containers as the default mechanism for running it for that to be viable in my mind. > >> >>> - Complexity. Considering the existence of the Sidecar project, it would be >>> less complex to avoid adding another (http?) service in Cassandra. >> >> Not sure that is really very complex, running an http service is a pretty >> easy? We already have netty in use to instantiate one from. >> I worry more about the complexity of having the matching schema for a set of >> sstables being read. The complexity of new sstable versions/formats being >> introduced. The complexity of having up to date data from memtables being >> considered by this API without having to flush before every query of it. >> The complexity of dealing with the new memtable API introduced in CEP-11. >> The complexity of coordinating compaction/streaming adding and removing >> files with these APIs reading them. There are a lot of edge cases to >> consider for this external access to sstables that the main process >> considers itself the “owner” of. >> >> All of this is not to say that I think separating things out into other >> processes/services is bad. But I think we need to be very careful with how >> we do it, or end users will end up running into all the sharp edges and the >> feature will fail. >> >> -Jeremiah >> >>> On Mar 24, 2023, at 8:15 PM, Yifan Cai <yc25c...@gmail.com >>> <mailto:yc25c...@gmail.com>> wrote: >>> >>> Hi Jeremiah, >>> >>> There are good reasons to not have these inside Cassandra. Consider the >>> following. >>> - Resources isolation. Having the said service running within the same JVM >>> may negatively impact Cassandra storage's performance. It could be more >>> beneficial to have them in Sidecar, which offers strong resource isolation >>> guarantees. >>> - Availability. If the Cassandra cluster is being bounced, using sidecar >>> would not affect the SBR/SBW functionality, e.g. SBR can still read >>> SSTables via sidecar endpoints. >>> - Compatibility. Sidecar provides stable REST-based APIs, such as uploading >>> SSTables endpoint, which would remain compatible with different versions of >>> Cassandra. The current implementation supports versions 3.0 and 4.0. >>> - Complexity. Considering the existence of the Sidecar project, it would be >>> less complex to avoid adding another (http?) service in Cassandra. >>> - Release velocity. Sidecar, as an independent project, can have a quicker >>> release cycle from Cassandra. >>> - The features in sidecar are mostly implemented based on various existing >>> tools/APIs exposed from Cassandra, e.g. ring, commit sstable, snapshot, etc. >>> >>> Regarding authentication and authorization >>> - We will add it as a follow-on CEP in Sidecar, but we don't want to hold >>> up this CEP. It would be a feature that benefits all Sidecar endpoints. >>> >>> - Yifan >>> >>> On Fri, Mar 24, 2023 at 2:43 PM Doug Rohrer <droh...@apple.com >>> <mailto:droh...@apple.com>> wrote: >>>> I agree that the analytics library will need to support vnodes. To be >>>> clear, there’s nothing preventing the solution from working with vnodes >>>> right now, and no assumptions about a 1:1 topology between a token and a >>>> node. However, we don’t, today, have the ability to test vnode support >>>> end-to-end. We are working towards that, however, and should be able to >>>> remove the caveat from the released analytics library once we can properly >>>> test vnode support. >>>> If it helps, I can update the CEP to say something more like “Caveat: >>>> Currently untested with vnodes - work is ongoing to remove this >>>> limitation” if that helps? >>>> >>>> Doug >>>> >>>> > On Mar 24, 2023, at 11:43 AM, Brandon Williams <dri...@gmail.com >>>> > <mailto:dri...@gmail.com>> wrote: >>>> > >>>> > On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan >>>> > <jeremiah.jor...@gmail.com <mailto:jeremiah.jor...@gmail.com>> wrote: >>>> >> >>>> >> I have concerns with the majority of this being in the sidecar and not >>>> >> in the database itself. I think it would make sense for the server >>>> >> side of this to be a new service exposed by the database, not in the >>>> >> sidecar. That way it can be able to properly integrate with the >>>> >> authentication and authorization apis, and to make it a first class >>>> >> citizen in terms of having unit/integration tests in the main DB >>>> >> ensuring no one breaks it. >>>> > >>>> > I don't think this can/should happen until it supports the database's >>>> > default configuration with vnodes. >>>> >>