Re: Cassandra Needs to Grow Up by Version Five!

Kyrylo Lebediev Tue, 20 Feb 2018 03:41:15 -0800

Agree with you, Daniel, regarding gaps in documentation.

---

At the same time I disagree with the folks who are complaining in this thread 
about some functionality like 'advanced backup' etc is missing out of the box.

We all live in the time where there are literally tons of open-source tools 
(automation, monitoring) and languages are available, also there are some 
really powerful SaaS solutions on the market which support C* (Datadog, for 
instance).

For example, while C* provides basic building blocks for anti-entropy repairs 
[I mean basic usage of 'nodetool repair' is not suitable for large production 
clusters], Reaper (many thanks to Spotify and TheLastPickle!) which uses this 
basic functionality solves the  task very well for real-world C* setups.

Something is missing  / could be improved in your opinion - we're in era of 
open-source. Create your own tool, let's say for C* backups automation using 
EBS snapshots, and upload it on GitHub.

C* is a DB-engine, not a fully-automated self-contained suite.
End-users are able to work on automation of routine [3rd party projects], 
meanwhile C* contributors may focus on core functionality.

--

Going back to documentation topic, as far as I understand, DataStax is no 
longer main C* contributor  and is focused on own C*-based proprietary software 
[correct me smb if I'm wrong].

This has led us to the situation when development of C* is progressing (as far 
as I understand, work is done mainly by some large C* users having enough 
resources to contribute to the C* project to get the features they need), but 
there is no single company which has taken over actualization of C* 
documentation / Wiki.

Honestly, even DataStax's documentation is  too concise and  is missing a lot 
of important details.

[BTW, just've taken a look at https://cassandra.apache.org/doc/latest/ and it 
looks not that 'bad':  despite of TODOs it contains a lot of valuable 
information]

So, I feel the C* Community has to join efforts on enriching existing 
documentation / resurrection of Wiki [where can be placed howto's, information 
about 3rd party automations and integrations etc].

By the Community I mean all of us including myself.

Regards,

Kyrill

________________________________
From: Daniel Hölbling-Inzko <daniel.hoelbling-in...@bitmovin.com>
Sent: Tuesday, February 20, 2018 11:28:13 AM
To: user@cassandra.apache.org; James Briggs
Cc: d...@cassandra.apache.org
Subject: Re: Cassandra Needs to Grow Up by Version Five!

Hi,

I have to add my own two cents here as the main thing that keeps me from really 
running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so 
different and the documentation and best practices are scattered across a dozen 
outdated DataStax articles and this mailing list etc.. We've been hesitant 
(although our use case is perfect for using Cassandra) to deploy Cassandra to 
any critical systems as even after a year of running it we still don't have the 
operational experience to confidently run critical systems with it.

Simple things like a foolproof / safe cluster-wide S3 Backup (like 
Elasticsearch has it) would for example solve a TON of issues for new people. I 
don't need it auto-scheduled or something, but having to configure cron jobs 
across the whole cluster is a pain in the ass for small teams.
To be honest, even the way snapshots are done right now is already super 
painful. Every other system I operated so far will just create one backup 
folder I can export, in C* the Backup is scattered across a bunch of different 
Keyspace folders etc.. needless to say that it took a while until I trusted my 
backup scripts fully.

And especially for a Database I believe Backup/Restore needs to be a non-issue 
that's documented front and center. If not smaller teams just don't have the 
resources to dedicate to learning and building the tools around it.

Now that the team is getting larger we could spare the resources to operate 
these things, but switching from a well-understood RDBMs schema to Cassandra is 
now incredibly hard and will probably take years.

greetings Daniel

On Tue, 20 Feb 2018 at 05:56 James Briggs <james.bri...@yahoo.com.invalid> 
wrote:
Kenneth:

What you said is not wrong.

Vertica and Riak are examples of distributed databases that don't require 
hand-holding.

Cassandra is for Java-programmer DIYers, or more often Datastax clients, at 
this point.
Thanks, James.

________________________________
From: Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Cc: d...@cassandra.apache.org<mailto:d...@cassandra.apache.org>
Sent: Monday, February 19, 2018 4:56 PM

Subject: RE: Cassandra Needs to Grow Up by Version Five!

Jeff, you helped me figure out what I was missing.  It just took me a day to 
digest what you wrote.  I’m coming over from another type of engineering.  I 
didn’t know and it’s not really documented.  Cassandra runs in a data center.  
Now days that means the nodes are going to be in managed containers, Docker 
containers, managed by Kerbernetes,  Meso or something, and for that reason 
anyone operating Cassandra in a real world setting would not encounter the 
issues I raised in the way I described.

Shouldn’t the architectural diagrams people reference indicate that in some 
way?  That would have help me.

Kenneth Brotman

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com<mailto:kenbrot...@yahoo.com>]
Sent: Monday, February 19, 2018 10:43 AM
To: 'user@cassandra.apache.org<mailto:user@cassandra.apache.org>'
Cc: 'd...@cassandra.apache.org<mailto:d...@cassandra.apache.org>'
Subject: RE: Cassandra Needs to Grow Up by Version Five!

Well said.  Very fair.  I wouldn’t mind hearing from others still.  You’re a 
good guy!

Kenneth Brotman

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Monday, February 19, 2018 9:10 AM
To: cassandra
Cc: Cassandra DEV
Subject: Re: Cassandra Needs to Grow Up by Version Five!

There's a lot of things below I disagree with, but it's ok. I convinced myself 
not to nit-pick every point.

https://issues.apache.org/jira/browse/CASSANDRA-13971 has some of Stefan's work 
with cert management

Beyond that, I encourage you to do what Michael suggested: open JIRAs for 
things you care strongly about, work on them if you have time. Sometime this 
year we'll schedule a NGCC (Next Generation Cassandra Conference) where we talk 
about future project work and direction, I encourage you to attend if you're 
able (I encourage anyone who cares about the direction of Cassandra to attend, 
it's probably be either free or very low cost, just to cover a venue and some 
food). If nothing else, you'll meet some of the teams who are working on the 
project, and learn why they've selected the projects on which they're working. 
You'll have an opportunity to pitch your vision, and maybe you can talk some 
folks into helping out.

- Jeff

On Mon, Feb 19, 2018 at 1:01 AM, Kenneth Brotman 
<kenbrot...@yahoo.com.invalid<mailto:kenbrot...@yahoo.com.invalid>> wrote:
Comments inline

>-----Original Message-----
>From: Jeff Jirsa [mailto:jji...@gmail.com<mailto:jji...@gmail.com>]
>Sent: Sunday, February 18, 2018 10:58 PM
>To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
>Cc: d...@cassandra.apache.org<mailto:d...@cassandra.apache.org>
>Subject: Re: Cassandra Needs to Grow Up by Version Five!
>
>Comments inline
>
>
>> On Feb 18, 2018, at 9:39 PM, Kenneth Brotman 
>> <kenbrot...@yahoo.com.INVALID<mailto:kenbrot...@yahoo.com.INVALID>> wrote:
>>
> >Cassandra feels like an unfinished program to me. The problem is not that 
> >it’s open source or cutting edge.  It’s an open source cutting edge program 
> >that lacks some of its basic functionality.  We are all stuck addressing 
> >fundamental mechanical tasks for Cassandra because the basic code that would 
> >do that part has not been contributed yet.
>>
>There’s probably 2-3 reasons why here:
>
>1) Historically the pmc has tried to keep the scope of the project very 
>narrow. It’s a database. We don’t ship drivers. We don’t ship developer tools. 
>We don’t ship fancy UIs. We ship a database. I think for the most part the 
>narrow vision has been for the best, but maybe it’s time to reconsider some of 
>the scope.
>
>Postgres will autovacuum to prevent wraparound (hopefully),  but everyone I 
>know running Postgres uses flexible-freeze in cron - sometimes it’s ok to let 
>the database have its opinions and let third party tools fill in the gaps.
>

I can appreciate the desire to stay in scope.  I believe usability is the King. 
 When users have to learn the database, then learn what they have to automate, 
then learn an automation tool and then use the automation tool to do something 
that is as fundamental as the fundamental tasks I described, then something is 
missing from the database itself that is adversely affecting usability - and 
that is very bad.  Where those big companies need to calculate the ROI is in 
the cost of acquiring or training the next group of users.  Consider how steep 
the learning curve is for new users.  Consider the business case for improving 
ease of use.

>2) Cassandra is, by definition, a database for large scale problems. Most of 
>the companies working on/with it tend to be big companies. Big companies often 
>have pre-existing automation that solved the stuff you consider fundamental 
>tasks, so there’s probably nobody actively working on the solved problems that 
>you may consider missing features - for many people they’re already solved.
>

I could be wrong but it sounds like a lot of the code work is done, and if the 
companies would take the time to contribute more code, then the rest of the 
code needed could be generated easily.

>3) It’s not nearly as basic as you think it is. Datastax seemingly had a 
>multi-person team on opscenter, and while it was better than anything else 
>around last time I used it (before it stopped supporting the OSS version), it 
>left a lot to be desired. It’s probably 2-3 engineers working for a month  to 
>have any sort of meaningful, reliable, mostly trivial cluster-managing UI, and 
>I can think of about 10 JIRAs I’d rather see that time be spent on first.

How about 6-9 engineers working 12 months a year on it then.  I'm not kidding.  
For a big company with revenues in the tens of billions or more, and a heavy 
use of Cassandra nodes, it's easy to make a case for having a full time person 
or more that involved.  They aren't paying for using the open source code that 
is Cassandra.  Let's see what would the licensing fees be for a big company if 
the costs where like Microsoft or Oracle would charge for their enterprise 
level relational database?   What's the contribution of one or two people in 
comparison.

>> Ease of use issues need to be given much more attention.  For an 
>> administrator, the ease of use of Cassandra is very poor.
>>
>>Furthermore, currently Cassandra is an idiot.  We have to do everything for 
>>Cassandra. Contrast that with the fact that we are in the dawn of artificial 
>>intelligence.
>>
>
>And for everything you think is obvious, there’s a 50% chance someone else 
>will have already solved differently, and your obvious new solution will be 
>seen as an inconvenient assumption and complexity they won’t appreciate. Open 
>source projects get to walk a fine line of trying to be useful without making 
>too many assumptions, being “too” opinionated, or overstepping bounds. We may 
>be too conservative, but it’s very easy to go too far in the opposite 
>direction.
>

I appreciate that but when such concerns result in inaction instead of 
resolution that is no good.

>> Software exists to automate tasks for humans, not mechanize humans to 
>> administer tasks for a database.  I’m an engineering type.  My job is to 
>> apply science and technology to solve real world problems.  And that’s where 
>> I need an organization’s I.T. talent to focus; not in crank starting an 
>> unfinished database.
>>
>
>And that’s why nobody’s done it - we all have bigger problems we’re being paid 
>to solve, and nobody’s felt it necessary. Because it’s not necessary, it’s 
>nice, but not required.
>

Of course you would say that, you're Jeff Jirsa.  In apprenticeship speak, 
you’re a master.  It's the classic challenge of trying to  get a master to see 
the legitimate issues of the apprentices.  I do appreciate the time you give to 
answer posts to the groups , like this post.  So I don't want you to take 
anything the wrong way.  Where it's going to bit everyone is in the future 
adoption rate.  It has to be addressed.

[snip]

>> Certificate management should be automated.
>>
>Stefan (in particular) has done a fair amount of work on this, but I’d bet 90% 
>of users don’t use ssl and genuinely don’t care.
>

I didn't realize.  Could I trouble you for a link so I could get up to speed?

>> Cluster wide management should be a big theme in any next major release.
>>
>Na. Stability and testing should be a big theme in the next major release.
>

Double Na on that one Jeff.  I think you have a concern there about the need to 
test sufficiently to ensure the stability of the next major release.  That 
makes perfect sense.- for every release, especially the major ones.  Continuous 
improvement is not a phase of development for example.  CI should be in 
everything, in every phase.  Stability and testing a part of every release not 
just one.  A major release should be a nice step from the previous major 
release though.

>> What is a major release?  How many major releases could a program have 
>> before all the coding for basic stuff like installation, configuration and 
>> maintenance is included!
>>
>> Finish the basic coding of Cassandra, make it easy to use for 
>> administrators, make is smart, add cluster wide management.  Keep Cassandra 
>> competitive or it will soon be the old Model T we all remember fondly.
>>
>
>Let’s keep some perspective. Most of us came to Cassandra from rdbms worlds 
>where we were building solutions out of a bunch of master/slave MySQL / 
>Postgres type databases. I started using Cassandra 0.6 when I needed to store 
>something like 400gb/day in 200whatever on spinning disks when 100gb felt like 
>a “big” database, and the thought of writing runbooks and automation to 
>automatically pick the most up to date slave as the new master, promote it, 
>repoint the other slave to the new master, then reformat the old master and 
>add it as a new slave without downtime and without potentially deleting the 
>company’s whole dataset sounded awful. Cassandra solved that problem, at the 
>cost of maintaining a few yaml (then xml) files. Yes there are rough edges - 
>they get slightly less rough on each new release. Can we do better? Sure, use 
>your engineering time and send some patches. But the basic stuff is the nuts 
>and bolts of the database: I care way more about streaming and compaction than 
>I’ll ever care about installation.
>

I can relate.  I was studying the enterprise level MS SQL Server stuff. I 
noticed exactly what you described.  I decided maybe I'll just do other stuff 
and wait for things to develop more.  I'm very excited about the way Cassandra 
addresses things.  Streaming and compaction - very good.  I'm glad.  Items 
related to usability are not optional though.

>> I ask the Committee to compile a list of all such items, make a plan, and 
>> commit to including the completed and tested code as part of major release 
>> 5.0.  I further ask that release 4.0 not be delayed and then there be an 
>> unusually short skip to version 5.0.
>>
>
>The committers are working their ass off on all sorts of hard problems. Some 
>of those are probably even related to Cassandra. If you have idea, open a 
>JIRA. If you have time, send a patch. Or review a patch. But don’t expect a 
>bunch of people to set down work on optimizing the database to work on 
>packaging and installation, because there’s no ROI in it for 99% of the 
>existing committers: we’re working on the database to solve problems, and 
>installation isn’t one of those problems.

I'm sure they are working very hard on all kinds of hard problems.  I actually 
wrote "Committee", not "committers"  There is an obvious shortage of 
contributors when you consider the size of the organizations using Cassandra.  
That leave the burden on an unfair few.  Installation or more generally I would 
say usability is not that big a problem for the big companies out there. Good 
for them.

Ask a new organization or a modest size organization that is struggling to 
manage their Cassandra cluster that usability is not a big problem. It truly is 
a big problem for many stakeholders of Cassandra. It needs to be given a bigger 
priority.  Hopefully others will weigh in.

Kenneth Brotman

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscribe@cassandra.apacheorg>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

Re: Cassandra Needs to Grow Up by Version Five!

Reply via email to