[
https://issues.apache.org/jira/browse/CASSANDRA-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096080#comment-13096080
]
Sylvain Lebresne commented on CASSANDRA-3122:
---------------------------------------------
bq. I don't understand how the changes to writeRow work without doing anything
to cF b/s asking for its serializedSize
In (the new method) getColumnFamily, when we reuse a previous column family to
add new columns to it, we start by removing it's size from the estimate, so
that when writeRow is called on the updated cf, by adding the whole size we
should still have a good estimate (actually a better one that before because we
don't count the row key multiple times anymore).
> SSTableSimpleUnsortedWriter take long time when inserting big rows
> ------------------------------------------------------------------
>
> Key: CASSANDRA-3122
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3122
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.3
> Reporter: Benoit Perroud
> Priority: Minor
> Fix For: 0.8.5
>
> Attachments: 3122.patch, SSTableSimpleUnsortedWriter-v2.patch,
> SSTableSimpleUnsortedWriter.patch
>
>
> In SSTableSimpleUnsortedWriter, when dealing with rows having a lot of
> columns, if we call newRow several times (to flush data as soon as possible),
> the time taken by the newRow() call is increasing non linearly. This is
> because when newRow is called, we merge the size increasing existing CF with
> the new one.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira