RE: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Kenneth Brotman
Hi Michael, actually I do very much like the database.  thanks for the 
thoughts... a few comments:

1) Lots of big companies like, let's see, Apple is a big one, probably could 
easily justify contributing resources to finish up the basic development of 
Cassandra. 
2) There are lots of big companies using Cassandra.  Each could contribute a 
tiny effort and everyone would benefit greatly.
3) A focused effort by a small group of talented people like there are in this 
group could knock it out easily.
4) Not everyone is a Cassandra coder.  It's not for me to do Michael.
5) I'm an individual.  I am not working at a big company at the moment Michael. 
 

Best,
Kenneth Brotman


-Original Message-
From: Michael Kjellman [mailto:kjell...@apple.com] 
Sent: Sunday, February 18, 2018 10:18 PM
To: dev@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

hi ken, sorry you don’t like the database. some thoughts:

1) please file actionable jiras for places you feel need to be improved in the 
database... this is the best way to make and encourage the change you’re 
looking for. it seems you have quite a few ideas from your post that could be 
broken down into individual actionable jiras.
2) please don’t cross post between mailing lists.
3) pull requests are always welcomed!

best,
kjellman

> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me.  The problem is not 
> that it's open source or cutting edge.  It's an open source cutting 
> edge program that lacks some of its basic functionality.  We are all 
> stuck addressing fundamental mechanical tasks for Cassandra because 
> the basic code that would do that part has not been contributed yet.
> 
> Ease of use issues need to be given much more attention.  For an 
> administrator, the ease of use of Cassandra is very poor.
> 
> Furthermore, currently Cassandra is an idiot.  We have to do 
> everything for Cassandra. Contrast that with the fact that we are in 
> the dawn of artificial intelligence.
> 
> Software exists to automate tasks for humans, not mechanize humans to 
> administer tasks for a database.  I'm an engineering type.  My job is 
> to apply science and technology to solve real world problems.  And 
> that's where I need an organization's I.T. talent to focus; not in 
> crank starting an unfinished database.
> 
> For example, I should be able to go to any node, replace the 
> Cassandra.yaml file and have a prompt on the display ask me if I want 
> to update all the yaml files across the cluster.  I shouldn't have to 
> manually modify yaml files on each node or have to create a script for 
> some third party automation tool to do it.
> 
> I should not have to turn off service, clear directories, restart 
> service in coordination with the other nodes.  It's already a computer 
> system.  It can do those things on its own.
> 
> How about read repair.  First there is something wrong with the name.  
> Maybe it should be called Consistency Repair.  An administrator 
> shouldn't have to do anything.  It should be a behavior of Cassandra 
> that is programmed in. It should consider the GC setting of each node, 
> calculate how often it has to run repair, when it should run it so all 
> the nodes aren't trying at the same time and when other circumstances 
> indicate it should also run it.
> 
> Certificate management should be automated.
> 
> Cluster wide management should be a big theme in any next major release.
> What is a major release?  How many major releases could a program have 
> before all the coding for basic stuff like installation, configuration 
> and maintenance is included!
> 
> Finish the basic coding of Cassandra, make it easy to use for 
> administrators, make is smart, add cluster wide management.  Keep 
> Cassandra competitive or it will soon be the old Model T we all remember 
> fondly.
> 
> I ask the Committee to compile a list of all such items, make a plan, 
> and commit to including the completed and tested code as part of major 
> release 5.0.  I further ask that release 4.0 not be delayed and then 
> there be an unusually short skip to version 5.0.
> 
> Kenneth Brotman
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Jeff Jirsa
Comments inline 


> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me. The problem is not that 
> it’s open source or cutting edge.  It’s an open source cutting edge program 
> that lacks some of its basic functionality.  We are all stuck addressing 
> fundamental mechanical tasks for Cassandra because the basic code that would 
> do that part has not been contributed yet.
> 
There’s probably 2-3 reasons why here:

1) Historically the pmc has tried to keep the scope of the project very narrow. 
It’s a database. We don’t ship drivers. We don’t ship developer tools. We don’t 
ship fancy UIs. We ship a database. I think for the most part the narrow vision 
has been for the best, but maybe it’s time to reconsider some of the scope. 

Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
the database have its opinions and let third party tools fill in the gaps.

2) Cassandra is, by definition, a database for large scale problems. Most of 
the companies working on/with it tend to be big companies. Big companies often 
have pre-existing automation that solved the stuff you consider fundamental 
tasks, so there’s probably nobody actively working on the solved problems that 
you may consider missing features - for many people they’re already solved.

3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
multi-person team on opscenter, and while it was better than anything else 
around last time I used it (before it stopped supporting the OSS version), it 
left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
I can think of about 10 JIRAs I’d rather see that time be spent on first. 

> Ease of use issues need to be given much more attention.  For an 
> administrator, the ease of use of Cassandra is very poor. 
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for 
> Cassandra. Contrast that with the fact that we are in the dawn of artificial 
> intelligence.
> 

And for everything you think is obvious, there’s a 50% chance someone else will 
have already solved differently, and your obvious new solution will be seen as 
an inconvenient assumption and complexity they won’t appreciate. Open source 
projects get to walk a fine line of trying to be useful without making too many 
assumptions, being “too” opinionated, or overstepping bounds. We may be too 
conservative, but it’s very easy to go too far in the opposite direction. 

> Software exists to automate tasks for humans, not mechanize humans to 
> administer tasks for a database.  I’m an engineering type.  My job is to 
> apply science and technology to solve real world problems.  And that’s where 
> I need an organization’s I.T. talent to focus; not in crank starting an 
> unfinished database.
> 

And that’s why nobody’s done it - we all have bigger problems we’re being paid 
to solve, and nobody’s felt it necessary. Because it’s not necessary, it’s 
nice, but not required.

> For example, I should be able to go to any node, replace the Cassandra.yaml 
> file and have a prompt on the display ask me if I want to update all the yaml 
> files across the cluster.  I shouldn’t have to manually modify yaml files on 
> each node or have to create a script for some third party automation tool to 
> do it. 
> 
I don’t see this ever happening.  Your config management already pushes files 
around your infrastructure, Cassandra doesn’t need to do it. 

> I should not have to turn off service, clear directories, restart service in 
> coordination with the other nodes.  It’s already a computer system.  It can 
> do those things on its own.
> 

The only time you should be doing this is when you’re wiping nodes from failed 
bootstrap, and that stopped being required in 2.2.
> How about read repair.  First there is something wrong with the name.  Maybe 
> it should be called Consistency Repair.  An administrator shouldn’t have to 
> do anything.  It should be a behavior of Cassandra that is programmed in. It 
> should consider the GC setting of each node, calculate how often it has to 
> run repair, when it should run it so all the nodes aren’t trying at the same 
> time and when other circumstances indicate it should also run it.
> 
There’s a good argument to be made that something like Reaper should be shipped 
with Cassandra. There’s another good argument that most tools like this end up 
needing some sort of leader election for scheduling and that goes against a lot 
of the fundamental assumptions in Cassandra (all nodes are equal, etc) - 
solving that problem is probably at least part of why you haven’t seen them 
built into the db. “Leader election is easy” you’ll say, and I’ll laugh and 
tell you about users I know who have DCs go offline for weeks at a time. 

> 

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Michael Kjellman
hi ken, sorry you don’t like the database. some thoughts:

1) please file actionable jiras for places you feel need to be improved in the 
database... this is the best way to make and encourage the change you’re 
looking for. it seems you have quite a few ideas from your post that could be 
broken down into individual actionable jiras.
2) please don’t cross post between mailing lists.
3) pull requests are always welcomed!

best,
kjellman

> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman  
> wrote:
> 
> Cassandra feels like an unfinished program to me.  The problem is not that
> it's open source or cutting edge.  It's an open source cutting edge program
> that lacks some of its basic functionality.  We are all stuck addressing
> fundamental mechanical tasks for Cassandra because the basic code that would
> do that part has not been contributed yet.
> 
> Ease of use issues need to be given much more attention.  For an
> administrator, the ease of use of Cassandra is very poor.  
> 
> Furthermore, currently Cassandra is an idiot.  We have to do everything for
> Cassandra. Contrast that with the fact that we are in the dawn of artificial
> intelligence.
> 
> Software exists to automate tasks for humans, not mechanize humans to
> administer tasks for a database.  I'm an engineering type.  My job is to
> apply science and technology to solve real world problems.  And that's where
> I need an organization's I.T. talent to focus; not in crank starting an
> unfinished database.
> 
> For example, I should be able to go to any node, replace the Cassandra.yaml
> file and have a prompt on the display ask me if I want to update all the
> yaml files across the cluster.  I shouldn't have to manually modify yaml
> files on each node or have to create a script for some third party
> automation tool to do it.  
> 
> I should not have to turn off service, clear directories, restart service in
> coordination with the other nodes.  It's already a computer system.  It can
> do those things on its own.
> 
> How about read repair.  First there is something wrong with the name.  Maybe
> it should be called Consistency Repair.  An administrator shouldn't have to
> do anything.  It should be a behavior of Cassandra that is programmed in. It
> should consider the GC setting of each node, calculate how often it has to
> run repair, when it should run it so all the nodes aren't trying at the same
> time and when other circumstances indicate it should also run it.
> 
> Certificate management should be automated.
> 
> Cluster wide management should be a big theme in any next major release.
> What is a major release?  How many major releases could a program have
> before all the coding for basic stuff like installation, configuration and
> maintenance is included!
> 
> Finish the basic coding of Cassandra, make it easy to use for
> administrators, make is smart, add cluster wide management.  Keep Cassandra
> competitive or it will soon be the old Model T we all remember fondly.
> 
> I ask the Committee to compile a list of all such items, make a plan, and
> commit to including the completed and tested code as part of major release
> 5.0.  I further ask that release 4.0 not be delayed and then there be an
> unusually short skip to version 5.0. 
> 
> Kenneth Brotman
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Cassandra Needs to Grow Up by Version Five!

2018-02-18 Thread Kenneth Brotman
Cassandra feels like an unfinished program to me.  The problem is not that
it's open source or cutting edge.  It's an open source cutting edge program
that lacks some of its basic functionality.  We are all stuck addressing
fundamental mechanical tasks for Cassandra because the basic code that would
do that part has not been contributed yet.

Ease of use issues need to be given much more attention.  For an
administrator, the ease of use of Cassandra is very poor.  

Furthermore, currently Cassandra is an idiot.  We have to do everything for
Cassandra. Contrast that with the fact that we are in the dawn of artificial
intelligence.

Software exists to automate tasks for humans, not mechanize humans to
administer tasks for a database.  I'm an engineering type.  My job is to
apply science and technology to solve real world problems.  And that's where
I need an organization's I.T. talent to focus; not in crank starting an
unfinished database.

For example, I should be able to go to any node, replace the Cassandra.yaml
file and have a prompt on the display ask me if I want to update all the
yaml files across the cluster.  I shouldn't have to manually modify yaml
files on each node or have to create a script for some third party
automation tool to do it.  

I should not have to turn off service, clear directories, restart service in
coordination with the other nodes.  It's already a computer system.  It can
do those things on its own.

How about read repair.  First there is something wrong with the name.  Maybe
it should be called Consistency Repair.  An administrator shouldn't have to
do anything.  It should be a behavior of Cassandra that is programmed in. It
should consider the GC setting of each node, calculate how often it has to
run repair, when it should run it so all the nodes aren't trying at the same
time and when other circumstances indicate it should also run it.

Certificate management should be automated.

Cluster wide management should be a big theme in any next major release.
What is a major release?  How many major releases could a program have
before all the coding for basic stuff like installation, configuration and
maintenance is included!

Finish the basic coding of Cassandra, make it easy to use for
administrators, make is smart, add cluster wide management.  Keep Cassandra
competitive or it will soon be the old Model T we all remember fondly.

I ask the Committee to compile a list of all such items, make a plan, and
commit to including the completed and tested code as part of major release
5.0.  I further ask that release 4.0 not be delayed and then there be an
unusually short skip to version 5.0. 

Kenneth Brotman