RE: Cassandra Needs to Grow Up by Version Five!
Hi Michael, actually I do very much like the database. thanks for the thoughts... a few comments: 1) Lots of big companies like, let's see, Apple is a big one, probably could easily justify contributing resources to finish up the basic development of Cassandra. 2) There are lots of big companies using Cassandra. Each could contribute a tiny effort and everyone would benefit greatly. 3) A focused effort by a small group of talented people like there are in this group could knock it out easily. 4) Not everyone is a Cassandra coder. It's not for me to do Michael. 5) I'm an individual. I am not working at a big company at the moment Michael. Best, Kenneth Brotman -Original Message- From: Michael Kjellman [mailto:kjell...@apple.com] Sent: Sunday, February 18, 2018 10:18 PM To: dev@cassandra.apache.org Subject: Re: Cassandra Needs to Grow Up by Version Five! hi ken, sorry you don’t like the database. some thoughts: 1) please file actionable jiras for places you feel need to be improved in the database... this is the best way to make and encourage the change you’re looking for. it seems you have quite a few ideas from your post that could be broken down into individual actionable jiras. 2) please don’t cross post between mailing lists. 3) pull requests are always welcomed! best, kjellman > On Feb 18, 2018, at 9:39 PM, Kenneth Brotman > wrote: > > Cassandra feels like an unfinished program to me. The problem is not > that it's open source or cutting edge. It's an open source cutting > edge program that lacks some of its basic functionality. We are all > stuck addressing fundamental mechanical tasks for Cassandra because > the basic code that would do that part has not been contributed yet. > > Ease of use issues need to be given much more attention. For an > administrator, the ease of use of Cassandra is very poor. > > Furthermore, currently Cassandra is an idiot. We have to do > everything for Cassandra. Contrast that with the fact that we are in > the dawn of artificial intelligence. > > Software exists to automate tasks for humans, not mechanize humans to > administer tasks for a database. I'm an engineering type. My job is > to apply science and technology to solve real world problems. And > that's where I need an organization's I.T. talent to focus; not in > crank starting an unfinished database. > > For example, I should be able to go to any node, replace the > Cassandra.yaml file and have a prompt on the display ask me if I want > to update all the yaml files across the cluster. I shouldn't have to > manually modify yaml files on each node or have to create a script for > some third party automation tool to do it. > > I should not have to turn off service, clear directories, restart > service in coordination with the other nodes. It's already a computer > system. It can do those things on its own. > > How about read repair. First there is something wrong with the name. > Maybe it should be called Consistency Repair. An administrator > shouldn't have to do anything. It should be a behavior of Cassandra > that is programmed in. It should consider the GC setting of each node, > calculate how often it has to run repair, when it should run it so all > the nodes aren't trying at the same time and when other circumstances > indicate it should also run it. > > Certificate management should be automated. > > Cluster wide management should be a big theme in any next major release. > What is a major release? How many major releases could a program have > before all the coding for basic stuff like installation, configuration > and maintenance is included! > > Finish the basic coding of Cassandra, make it easy to use for > administrators, make is smart, add cluster wide management. Keep > Cassandra competitive or it will soon be the old Model T we all remember > fondly. > > I ask the Committee to compile a list of all such items, make a plan, > and commit to including the completed and tested code as part of major > release 5.0. I further ask that release 4.0 not be delayed and then > there be an unusually short skip to version 5.0. > > Kenneth Brotman > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Cassandra Needs to Grow Up by Version Five!
Comments inline > On Feb 18, 2018, at 9:39 PM, Kenneth Brotman > wrote: > > Cassandra feels like an unfinished program to me. The problem is not that > it’s open source or cutting edge. It’s an open source cutting edge program > that lacks some of its basic functionality. We are all stuck addressing > fundamental mechanical tasks for Cassandra because the basic code that would > do that part has not been contributed yet. > There’s probably 2-3 reasons why here: 1) Historically the pmc has tried to keep the scope of the project very narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. We don’t ship fancy UIs. We ship a database. I think for the most part the narrow vision has been for the best, but maybe it’s time to reconsider some of the scope. Postgres will autovacuum to prevent wraparound (hopefully), but everyone I know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let the database have its opinions and let third party tools fill in the gaps. 2) Cassandra is, by definition, a database for large scale problems. Most of the companies working on/with it tend to be big companies. Big companies often have pre-existing automation that solved the stuff you consider fundamental tasks, so there’s probably nobody actively working on the solved problems that you may consider missing features - for many people they’re already solved. 3) It’s not nearly as basic as you think it is. Datastax seemingly had a multi-person team on opscenter, and while it was better than anything else around last time I used it (before it stopped supporting the OSS version), it left a lot to be desired. It’s probably 2-3 engineers working for a month to have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and I can think of about 10 JIRAs I’d rather see that time be spent on first. > Ease of use issues need to be given much more attention. For an > administrator, the ease of use of Cassandra is very poor. > > Furthermore, currently Cassandra is an idiot. We have to do everything for > Cassandra. Contrast that with the fact that we are in the dawn of artificial > intelligence. > And for everything you think is obvious, there’s a 50% chance someone else will have already solved differently, and your obvious new solution will be seen as an inconvenient assumption and complexity they won’t appreciate. Open source projects get to walk a fine line of trying to be useful without making too many assumptions, being “too” opinionated, or overstepping bounds. We may be too conservative, but it’s very easy to go too far in the opposite direction. > Software exists to automate tasks for humans, not mechanize humans to > administer tasks for a database. I’m an engineering type. My job is to > apply science and technology to solve real world problems. And that’s where > I need an organization’s I.T. talent to focus; not in crank starting an > unfinished database. > And that’s why nobody’s done it - we all have bigger problems we’re being paid to solve, and nobody’s felt it necessary. Because it’s not necessary, it’s nice, but not required. > For example, I should be able to go to any node, replace the Cassandra.yaml > file and have a prompt on the display ask me if I want to update all the yaml > files across the cluster. I shouldn’t have to manually modify yaml files on > each node or have to create a script for some third party automation tool to > do it. > I don’t see this ever happening. Your config management already pushes files around your infrastructure, Cassandra doesn’t need to do it. > I should not have to turn off service, clear directories, restart service in > coordination with the other nodes. It’s already a computer system. It can > do those things on its own. > The only time you should be doing this is when you’re wiping nodes from failed bootstrap, and that stopped being required in 2.2. > How about read repair. First there is something wrong with the name. Maybe > it should be called Consistency Repair. An administrator shouldn’t have to > do anything. It should be a behavior of Cassandra that is programmed in. It > should consider the GC setting of each node, calculate how often it has to > run repair, when it should run it so all the nodes aren’t trying at the same > time and when other circumstances indicate it should also run it. > There’s a good argument to be made that something like Reaper should be shipped with Cassandra. There’s another good argument that most tools like this end up needing some sort of leader election for scheduling and that goes against a lot of the fundamental assumptions in Cassandra (all nodes are equal, etc) - solving that problem is probably at least part of why you haven’t seen them built into the db. “Leader election is easy” you’ll say, and I’ll laugh and tell you about users I know who have DCs go offline for weeks at a time. >
Re: Cassandra Needs to Grow Up by Version Five!
hi ken, sorry you don’t like the database. some thoughts: 1) please file actionable jiras for places you feel need to be improved in the database... this is the best way to make and encourage the change you’re looking for. it seems you have quite a few ideas from your post that could be broken down into individual actionable jiras. 2) please don’t cross post between mailing lists. 3) pull requests are always welcomed! best, kjellman > On Feb 18, 2018, at 9:39 PM, Kenneth Brotman > wrote: > > Cassandra feels like an unfinished program to me. The problem is not that > it's open source or cutting edge. It's an open source cutting edge program > that lacks some of its basic functionality. We are all stuck addressing > fundamental mechanical tasks for Cassandra because the basic code that would > do that part has not been contributed yet. > > Ease of use issues need to be given much more attention. For an > administrator, the ease of use of Cassandra is very poor. > > Furthermore, currently Cassandra is an idiot. We have to do everything for > Cassandra. Contrast that with the fact that we are in the dawn of artificial > intelligence. > > Software exists to automate tasks for humans, not mechanize humans to > administer tasks for a database. I'm an engineering type. My job is to > apply science and technology to solve real world problems. And that's where > I need an organization's I.T. talent to focus; not in crank starting an > unfinished database. > > For example, I should be able to go to any node, replace the Cassandra.yaml > file and have a prompt on the display ask me if I want to update all the > yaml files across the cluster. I shouldn't have to manually modify yaml > files on each node or have to create a script for some third party > automation tool to do it. > > I should not have to turn off service, clear directories, restart service in > coordination with the other nodes. It's already a computer system. It can > do those things on its own. > > How about read repair. First there is something wrong with the name. Maybe > it should be called Consistency Repair. An administrator shouldn't have to > do anything. It should be a behavior of Cassandra that is programmed in. It > should consider the GC setting of each node, calculate how often it has to > run repair, when it should run it so all the nodes aren't trying at the same > time and when other circumstances indicate it should also run it. > > Certificate management should be automated. > > Cluster wide management should be a big theme in any next major release. > What is a major release? How many major releases could a program have > before all the coding for basic stuff like installation, configuration and > maintenance is included! > > Finish the basic coding of Cassandra, make it easy to use for > administrators, make is smart, add cluster wide management. Keep Cassandra > competitive or it will soon be the old Model T we all remember fondly. > > I ask the Committee to compile a list of all such items, make a plan, and > commit to including the completed and tested code as part of major release > 5.0. I further ask that release 4.0 not be delayed and then there be an > unusually short skip to version 5.0. > > Kenneth Brotman > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Cassandra Needs to Grow Up by Version Five!
Cassandra feels like an unfinished program to me. The problem is not that it's open source or cutting edge. It's an open source cutting edge program that lacks some of its basic functionality. We are all stuck addressing fundamental mechanical tasks for Cassandra because the basic code that would do that part has not been contributed yet. Ease of use issues need to be given much more attention. For an administrator, the ease of use of Cassandra is very poor. Furthermore, currently Cassandra is an idiot. We have to do everything for Cassandra. Contrast that with the fact that we are in the dawn of artificial intelligence. Software exists to automate tasks for humans, not mechanize humans to administer tasks for a database. I'm an engineering type. My job is to apply science and technology to solve real world problems. And that's where I need an organization's I.T. talent to focus; not in crank starting an unfinished database. For example, I should be able to go to any node, replace the Cassandra.yaml file and have a prompt on the display ask me if I want to update all the yaml files across the cluster. I shouldn't have to manually modify yaml files on each node or have to create a script for some third party automation tool to do it. I should not have to turn off service, clear directories, restart service in coordination with the other nodes. It's already a computer system. It can do those things on its own. How about read repair. First there is something wrong with the name. Maybe it should be called Consistency Repair. An administrator shouldn't have to do anything. It should be a behavior of Cassandra that is programmed in. It should consider the GC setting of each node, calculate how often it has to run repair, when it should run it so all the nodes aren't trying at the same time and when other circumstances indicate it should also run it. Certificate management should be automated. Cluster wide management should be a big theme in any next major release. What is a major release? How many major releases could a program have before all the coding for basic stuff like installation, configuration and maintenance is included! Finish the basic coding of Cassandra, make it easy to use for administrators, make is smart, add cluster wide management. Keep Cassandra competitive or it will soon be the old Model T we all remember fondly. I ask the Committee to compile a list of all such items, make a plan, and commit to including the completed and tested code as part of major release 5.0. I further ask that release 4.0 not be delayed and then there be an unusually short skip to version 5.0. Kenneth Brotman