[
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuki Morishita updated CASSANDRA-3003:
--------------------------------------
Attachment: mylyn-context.zip
> Trunk single-pass streaming doesn't handle large row correctly
> --------------------------------------------------------------
>
> Key: CASSANDRA-3003
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0
> Reporter: Sylvain Lebresne
> Assignee: Yuki Morishita
> Priority: Critical
> Labels: streaming
> Fix For: 1.0
>
> Attachments: 3003-v1.txt, 3003-v2.txt, mylyn-context.zip
>
>
> For normal column family, trunk streaming always buffer the whole row into
> memory. In uses
> {noformat}
> ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
> {noformat}
> on the input bytes.
> We must avoid this for rows that don't fit in the inMemoryLimit.
> Note that for regular column families, for a given row, there is actually no
> need to even recreate the bloom filter of column index, nor to deserialize
> the columns. It is enough to filter the key and row size to feed the index
> writer, but then simply dump the rest on disk directly. This would make
> streaming more efficient, avoid a lot of object creation and avoid the
> pitfall of big rows.
> Counters column family are unfortunately trickier, because each column needs
> to be deserialized (to mark them as 'fromRemote'). However, we don't need to
> do the double pass of LazilyCompactedRow for that. We can simply use a
> SSTableIdentityIterator and deserialize/reserialize input as it comes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira