[
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083123#comment-13083123
]
Sylvain Lebresne commented on CASSANDRA-3003:
---------------------------------------------
bq. Can we pad it somehow?
It's doable. Basically a context is an array of shards, with a header that is a
(variable) list of which of those shards are a delta. When we cleanup the delta
we remove the header basically. We could have a specific cleanup for streaming
that just set all the header to -1. But we probably want to do that only for
the cleanup during streaming, and have compaction clean those afterwards,
otherwise it is ugly. I don't know how much easier it is than cleaning during
reads, though it avoids having to add a new info for sstable metadata.
> Trunk single-pass streaming doesn't handle large row correctly
> --------------------------------------------------------------
>
> Key: CASSANDRA-3003
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Sylvain Lebresne
> Assignee: Yuki Morishita
> Priority: Critical
> Labels: streaming
>
> For normal column family, trunk streaming always buffer the whole row into
> memory. In uses
> {noformat}
> ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
> {noformat}
> on the input bytes.
> We must avoid this for rows that don't fit in the inMemoryLimit.
> Note that for regular column families, for a given row, there is actually no
> need to even recreate the bloom filter of column index, nor to deserialize
> the columns. It is enough to filter the key and row size to feed the index
> writer, but then simply dump the rest on disk directly. This would make
> streaming more efficient, avoid a lot of object creation and avoid the
> pitfall of big rows.
> Counters column family are unfortunately trickier, because each column needs
> to be deserialized (to mark them as 'fromRemote'). However, we don't need to
> do the double pass of LazilyCompactedRow for that. We can simply use a
> SSTableIdentityIterator and deserialize/reserialize input as it comes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira