Re: A Roadmap to Cassandra Analytics 1.0

Doug Rohrer Wed, 23 Apr 2025 07:34:45 -0700

That’s great - thanks Štefan - please feel free to reach out in slack or via 
email if you’ve got any questions.


Doug

> On Apr 23, 2025, at 2:04 AM, Štefan Miklošovič <[email protected]> wrote:
> 
> Hi Doug,
> 
> I would love to help you with some of that. Spark 4.0 support seems appealing 
> to me. Let me check with my "backend" if there is any capacity doing so and 
> connecting privately to hash out the details.
> 
> Regards
> 
> On Tue, Apr 22, 2025 at 7:53 PM Doug Rohrer <[email protected] 
> <mailto:[email protected]>> wrote:
>> Hello folks,
>> 
>> As many of you on the ASF Slack may have noticed, I’ve been creating a bunch 
>> of new tickets for the Cassandra Analytics project related to a 1.0 release. 
>> Since it was initially contributed, there have been many enhancements and 
>> fixes to the library, but there are still some gaps that need to be 
>> addressed. We’re putting together a plan to close those gaps, and would love 
>> to enlist more folks from the community in making the analytics library more 
>> useful. The gaps we see today include:
>> vnode support (and optimizations to the exiting code if necessary to make it 
>> work more efficiently with clusters using vnodes) (CASSANALYTICS-10 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-10>)
>> Cassandra 5.0 support (this is an epic with lots of subtasks, some of which 
>> are already being worked on by a variety of folks) (CASSANALYTICS-23 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-23>)
>> Documentation, including both docs on cassandra.apache.org 
>> <http://cassandra.apache.org/> and updated/improved developer docs in the 
>> repository itself (CASSANALYTICS-6 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-6>)
>> Build scripts for release (CASSANALYTICS-22 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-22>)
>> Miscellaneous bug fixes of known issues/improvements
>> Analytics writer should support all valid partition/clustering key types 
>> (CASSANALYTICS-35 <https://issues.apache.org/jira/browse/CASSANALYTICS-35>)
>> CassandraDataLayer uses configuration list of IPs instead of the full 
>> ring/datacenter (CASSANALYTICS-20 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-20>)
>> Bulk Reader should dynamically calculate number of cores to use to better 
>> utilize resources for smaller tables (CASSANALYTICS-36 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-36>)
>> 
>> Beyond 1.0, there’s a lot of improvements and enhancements on the roadmap to 
>> date:
>> Cassandra 6.0 Support (CASSANALYTICS-37 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-37>)
>> Spark 4.0 support (CASSANALYTICS-34 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-34>)
>> JDK Support Matrix (CASSANALYTICS-38 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-38>)
>> Improved Compaction/Repair load for bulk writes (CASSANALYTICS-39 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-39>)
>> Bandwidth reduction (especially cross-dc writes) (CASSANALYTICS-40 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-40>)
>> Consolidation of SBW-on-S3 and DIRECT mode code (CASSANALYTICS-41 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-41>)
>> Bulk reads via S3 (CASSANALYTICS-42 
>> <https://issues.apache.org/jira/browse/CASSANALYTICS-42>)
>> 
>> We’re also looking for input on what others think should be in the 1.0 
>> release, or the long-term roadmap. If you’ve got ideas, don’t hesitate to 
>> respond to this thread. I’ll also be checking the existing JIRAs and making 
>> sure they are incorporated into the plan, which I believe most are already.
>> 
>> I want to thank the folks who have, so far, contributed most of the code for 
>> the Analytics library, and those in the community who have already started 
>> to use and improve it. We’re looking forward to getting more community 
>> members involved. If any of these items sounds interesting, please feel free 
>> to reach out to folks on Slack or reply on the dev list.
>> 
>> Thanks,
>> 
>> Doug Rohrer

Re: A Roadmap to Cassandra Analytics 1.0

Reply via email to