[
https://issues.apache.org/jira/browse/CASSANDRA-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yifan Cai updated CASSANDRA-19334:
----------------------------------
Change Category: Performance
Complexity: Normal
Component/s: Analytics Library
Status: Open (was: Triage Needed)
> [Analytics] Upgrade to Cassandra 4.0.12 and remove RowBufferMode and
> BatchSize options
> --------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19334
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19334
> Project: Cassandra
> Issue Type: Improvement
> Components: Analytics Library
> Reporter: Yifan Cai
> Assignee: Yifan Cai
> Priority: Normal
>
> In cassandra-all:4.0.12, improvements were made for the CQLSSTableWriter. The
> sorted writer now can produce size-capped SSTables. It replaces the need for
> the unsorted sstable writer, which has to buffer and sort data on flushing.
> The dataset to write in the spark application is already sorted. By avoiding
> using the unsorted writer, it prevents wasting CPU time on sorting the sorted
> data. Since the sorted sstable writer does not need to buffer data, its size
> estimation is more accurate than the unsorted one, meaning the produced
> sstables files are closer to the expectation.
> By removing the unsorted sstable writer, it no longer requires the
> RowBufferMode option.
> By supporting size-capping in sorted writer, it no longer requires the
> BatchSize option.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]