[ 
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094753#comment-13094753
 ] 

Sylvain Lebresne commented on CASSANDRA-3003:
---------------------------------------------

I think this is a little bit sad to deserialize all the columns in the 
non-counter case. We do need to do it right now because of the computation of 
the max timestamp, but maybe we could have the other side send use the max 
timestamp as part of the stream header (but I agree, it's a bit more 
complicated).

For the record, the handling of counter columns amounts to the initial 
proposition of Stu of moving the cleanup to the reads (though the solution is 
slightly different). So the "we'll cleanup on each read before the sstable is 
compacted" remark does hold here, but I don't see a better solution right now 
and the "those sstables will likely be compacted quickly" argument probably 
make this ok anyway.

Other comments:
* we need to use Integer.MIN_VALUE as the value for expireBefore when 
deserializing the columns, otherwise the expired columns will be converted to 
DeletedColumns, which will change there serialized size (and thus screw up the 
data size and column index)
* for markDeltaAsDeleted, we must check if the length is already negative and 
leave it so if it is, otherwise if a streamed sstable get re-streamed to 
another node before it was compacted, we could end up not cleaning the delta 
correctly.
* it would be nice in SSTW.appendFromStream() to assert the sanity of our 
little deserialize-reserialize dance and assert what we did write the number of 
bytes that we wrote in the header.
* the patch change a clearAllDelta to a markDeltaAsDeleted in CounterColumnTest 
which is bogus (and the test does fail with that change).
* I would markDeltaAsDeleted to markForClearingDelta as this describe what the 
function does better
* nitpick: there is a few space at end of lines in some comments (I know I 
know, I'm picky).


> Trunk single-pass streaming doesn't handle large row correctly
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3003
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Critical
>              Labels: streaming
>             Fix For: 1.0
>
>         Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, v3003-v4.txt
>
>
> For normal column family, trunk streaming always buffer the whole row into 
> memory. In uses
> {noformat}
>   ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
> {noformat}
> on the input bytes.
> We must avoid this for rows that don't fit in the inMemoryLimit.
> Note that for regular column families, for a given row, there is actually no 
> need to even recreate the bloom filter of column index, nor to deserialize 
> the columns. It is enough to filter the key and row size to feed the index 
> writer, but then simply dump the rest on disk directly. This would make 
> streaming more efficient, avoid a lot of object creation and avoid the 
> pitfall of big rows.
> Counters column family are unfortunately trickier, because each column needs 
> to be deserialized (to mark them as 'fromRemote'). However, we don't need to 
> do the double pass of LazilyCompactedRow for that. We can simply use a 
> SSTableIdentityIterator and deserialize/reserialize input as it comes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to