That’s great - thanks Štefan - please feel free to reach out in slack or via email if you’ve got any questions.
Doug > On Apr 23, 2025, at 2:04 AM, Štefan Miklošovič <smikloso...@apache.org> wrote: > > Hi Doug, > > I would love to help you with some of that. Spark 4.0 support seems appealing > to me. Let me check with my "backend" if there is any capacity doing so and > connecting privately to hash out the details. > > Regards > > On Tue, Apr 22, 2025 at 7:53 PM Doug Rohrer <droh...@apple.com > <mailto:droh...@apple.com>> wrote: >> Hello folks, >> >> As many of you on the ASF Slack may have noticed, I’ve been creating a bunch >> of new tickets for the Cassandra Analytics project related to a 1.0 release. >> Since it was initially contributed, there have been many enhancements and >> fixes to the library, but there are still some gaps that need to be >> addressed. We’re putting together a plan to close those gaps, and would love >> to enlist more folks from the community in making the analytics library more >> useful. The gaps we see today include: >> vnode support (and optimizations to the exiting code if necessary to make it >> work more efficiently with clusters using vnodes) (CASSANALYTICS-10 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>) >> Cassandra 5.0 support (this is an epic with lots of subtasks, some of which >> are already being worked on by a variety of folks) (CASSANALYTICS-23 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>) >> Documentation, including both docs on cassandra.apache.org >> <http://cassandra.apache.org/> and updated/improved developer docs in the >> repository itself (CASSANALYTICS-6 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>) >> Build scripts for release (CASSANALYTICS-22 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>) >> Miscellaneous bug fixes of known issues/improvements >> Analytics writer should support all valid partition/clustering key types >> (CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>) >> CassandraDataLayer uses configuration list of IPs instead of the full >> ring/datacenter (CASSANALYTICS-20 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>) >> Bulk Reader should dynamically calculate number of cores to use to better >> utilize resources for smaller tables (CASSANALYTICS-36 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>) >> >> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to >> date: >> Cassandra 6.0 Support (CASSANALYTICS-37 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-37>) >> Spark 4.0 support (CASSANALYTICS-34 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-34>) >> JDK Support Matrix (CASSANALYTICS-38 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-38>) >> Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-39>) >> Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-40>) >> Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-41>) >> Bulk reads via S3 (CASSANALYTICS-42 >> <https://issues.apache.org/jira/browse/CASSANALYTICS-42>) >> >> We’re also looking for input on what others think should be in the 1.0 >> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to >> respond to this thread. I’ll also be checking the existing JIRAs and making >> sure they are incorporated into the plan, which I believe most are already. >> >> I want to thank the folks who have, so far, contributed most of the code for >> the Analytics library, and those in the community who have already started >> to use and improve it. We’re looking forward to getting more community >> members involved. If any of these items sounds interesting, please feel free >> to reach out to folks on Slack or reply on the dev list. >> >> Thanks, >> >> Doug Rohrer