I love Spark. 3 or 4 years ago it was the first distributed computing environment that felt usable, and the community was welcoming.
But I just got back from the Reactive Summit, and this is what I observed: - Industry leaders on stage making fun of Spark's streaming model - Open source project leaders saying they looked at Spark's governance as a model to avoid - Users saying they chose Flink because it was technically superior and they couldn't get any answers on the Spark mailing lists Whether you agree with the substance of any of this, when this stuff gets repeated enough people will believe it. Right now Spark is suffering from its own success, and I think something needs to change. - We need a clear process for planning significant changes to the codebase. I'm not saying you need to adopt Kafka Improvement Proposals exactly, but you need a documented process with a clear outcome (e.g. a vote). Passing around google docs after an implementation has largely been decided on doesn't cut it. - All technical communication needs to be public. Things getting decided in private chat, or when 1/3 of the committers work for the same company and can just talk to each other... Yes, it's convenient, but it's ultimately detrimental to the health of the project. The way structured streaming has played out has shown that there are significant technical blind spots (myself included). One way to address that is to get the people who have domain knowledge involved, and listen to them. - We need more committers, and more committer diversity. Per committer there are, what, more than 20 contributors and 10 new jira tickets a month? It's too much. There are people (I am _not_ referring to myself) who have been around for years, contributed thousands of lines of code, helped educate the public around Spark... and yet are never going to be voted in. - We need a clear process for managing volunteer work. Too many tickets sit around unowned, unclosed, uncertain. If someone proposed something and it isn't up to snuff, tell them and close it. It may be blunt, but it's clearer than "silent no". If someone wants to work on something, let them own the ticket and set a deadline. If they don't meet it, close it or reassign it. This is not me putting on an Apache Bureaucracy hat. This is me saying, as a fellow hacker and loyal dissenter, something is wrong with the culture and process. Please, let's change it. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org