[ 
https://issues.apache.org/jira/browse/CASSANDRA-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-19334:
----------------------------------
    Change Category: Performance
         Complexity: Normal
        Component/s: Analytics Library
             Status: Open  (was: Triage Needed)

> [Analytics] Upgrade to Cassandra 4.0.12 and remove RowBufferMode and 
> BatchSize options
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19334
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19334
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Analytics Library
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>
> In cassandra-all:4.0.12, improvements were made for the CQLSSTableWriter. The 
> sorted writer now can produce size-capped SSTables. It replaces the need for 
> the unsorted sstable writer, which has to buffer and sort data on flushing. 
> The dataset to write in the spark application is already sorted. By avoiding 
> using the unsorted writer, it prevents wasting CPU time on sorting the sorted 
> data. Since the sorted sstable writer does not need to buffer data, its size 
> estimation is more accurate than the unsorted one, meaning the produced 
> sstables files are closer to the expectation.
> By removing the unsorted sstable writer, it no longer requires the 
> RowBufferMode option.
> By supporting size-capping in sorted writer, it no longer requires the 
> BatchSize option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to