[jira] [Created] (CASSANDRA-3003) Trunk single-pass streaming doesn't handle large row correctly

Sylvain Lebresne (JIRA) Mon, 08 Aug 2011 12:52:50 -0700

Trunk single-pass streaming doesn't handle large row correctly
--------------------------------------------------------------


                 Key: CASSANDRA-3003
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Sylvain Lebresne
            Assignee: Yuki Morishita


For normal column family, trunk streaming always buffer the whole row into 
memory. In uses
{noformat}
  ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
{noformat}
on the input bytes.
We must avoid this for rows that don't fit in the inMemoryLimit.

Note that for regular column families, for a given row, there is actually no 
need to even recreate the bloom filter of column index, nor to deserialize the 
columns. It is enough to filter the key and row size to feed the index writer, 
but then simply dump the rest on disk directly. This would make streaming more 
efficient, avoid a lot of object creation and avoid the pitfall of big rows.

Counters column family are unfortunately trickier, because each column needs to 
be deserialized (to mark them as 'fromRemote'). However, we don't need to do 
the double pass of LazilyCompactedRow for that. We can simply use a 
SSTableIdentityIterator and deserialize/reserialize input as it comes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-3003) Trunk single-pass streaming doesn't handle large row correctly

Reply via email to