Posted to https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Analytics+Roadmap - happy to discuss/edit further as well.
Doug > On May 1, 2025, at 9:39 AM, Doug Rohrer <droh...@apple.com> wrote: > > Patrick, > > Thanks for the clarification - makes sense. I can put the contents here up on > Confluence and we can work together to tweak it if necessary. > >> On Apr 30, 2025, at 11:15 AM, Patrick McFadin <pmcfa...@gmail.com> wrote: >> >> I'm not thinking that the Confluence page would be a status page or try to >> get too close to being a tracker. >> >> My motivation here is for the millions of users not watching the project >> intently and completely missing that this is happening. Case in point. I was >> recently in a Reddit thread with a guy trying to build his own CDC mechanism >> for Kafka topics. I pointed out that not only did sidecar exist, but maybe >> he would like to contribute? It's this kind of non-coding activity that has >> an awesome downstream effect on our project codebase by finding more >> contributors/users. My thoughts about this page in Confluence is a >> semi-dynamic page that explains what the project does, what's being worked >> on and potential areas of contribution. The latter being the most dynamic. >> If you have time, I can get on a zoom with you, take some notes and put it >> up. Doesn't have to be a big effort. >> >> Patrick >> >> On Wed, Apr 23, 2025 at 6:52 AM Doug Rohrer <droh...@apple.com >> <mailto:droh...@apple.com>> wrote: >>> I put everything into Jira directly - there are two epics, one for the >>> “Analytics 1.0 <https://issues.apache.org/jira/browse/CASSANALYTICS-21>” >>> release and one for “Cassandra 5.0 support. >>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>”, figuring that >>> once we started work on these things (which some folks actually have) a >>> Confluence page would quickly become out of date. >>> >>> If folks feel like there’s some value in putting something up there we >>> could do that, but I think epics in Jira capture the plan fairly well. >>> >>> Thanks, >>> >>> Doug >>> >>>> On Apr 22, 2025, at 6:15 PM, Patrick McFadin <pmcfa...@gmail.com >>>> <mailto:pmcfa...@gmail.com>> wrote: >>>> >>>> Is the current roadmap published somewhere? I went to Confluence and >>>> couldn't find anything. >>>> >>>> Patrick >>>> >>>> On Tue, Apr 22, 2025 at 10:53 AM Doug Rohrer <droh...@apple.com >>>> <mailto:droh...@apple.com>> wrote: >>>>> Hello folks, >>>>> >>>>> As many of you on the ASF Slack may have noticed, I’ve been creating a >>>>> bunch of new tickets for the Cassandra Analytics project related to a 1.0 >>>>> release. Since it was initially contributed, there have been many >>>>> enhancements and fixes to the library, but there are still some gaps that >>>>> need to be addressed. We’re putting together a plan to close those gaps, >>>>> and would love to enlist more folks from the community in making the >>>>> analytics library more useful. The gaps we see today include: >>>>> vnode support (and optimizations to the exiting code if necessary to make >>>>> it work more efficiently with clusters using vnodes) (CASSANALYTICS-10 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>) >>>>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of >>>>> which are already being worked on by a variety of folks) >>>>> (CASSANALYTICS-23 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>) >>>>> Documentation, including both docs on cassandra.apache.org >>>>> <http://cassandra.apache.org/> and updated/improved developer docs in the >>>>> repository itself (CASSANALYTICS-6 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>) >>>>> Build scripts for release (CASSANALYTICS-22 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>) >>>>> Miscellaneous bug fixes of known issues/improvements >>>>> Analytics writer should support all valid partition/clustering key types >>>>> (CASSANALYTICS-35 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-35>) >>>>> CassandraDataLayer uses configuration list of IPs instead of the full >>>>> ring/datacenter (CASSANALYTICS-20 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>) >>>>> Bulk Reader should dynamically calculate number of cores to use to better >>>>> utilize resources for smaller tables (CASSANALYTICS-36 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>) >>>>> >>>>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap >>>>> to date: >>>>> Cassandra 6.0 Support (CASSANALYTICS-37 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-37>) >>>>> Spark 4.0 support (CASSANALYTICS-34 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-34>) >>>>> JDK Support Matrix (CASSANALYTICS-38 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-38>) >>>>> Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-39>) >>>>> Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-40>) >>>>> Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-41>) >>>>> Bulk reads via S3 (CASSANALYTICS-42 >>>>> <https://issues.apache.org/jira/browse/CASSANALYTICS-42>) >>>>> >>>>> We’re also looking for input on what others think should be in the 1.0 >>>>> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to >>>>> respond to this thread. I’ll also be checking the existing JIRAs and >>>>> making sure they are incorporated into the plan, which I believe most are >>>>> already. >>>>> >>>>> I want to thank the folks who have, so far, contributed most of the code >>>>> for the Analytics library, and those in the community who have already >>>>> started to use and improve it. We’re looking forward to getting more >>>>> community members involved. If any of these items sounds interesting, >>>>> please feel free to reach out to folks on Slack or reply on the dev list. >>>>> >>>>> Thanks, >>>>> >>>>> Doug Rohrer >>> >