[
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094753#comment-13094753
]
Sylvain Lebresne commented on CASSANDRA-3003:
---------------------------------------------
I think this is a little bit sad to deserialize all the columns in the
non-counter case. We do need to do it right now because of the computation of
the max timestamp, but maybe we could have the other side send use the max
timestamp as part of the stream header (but I agree, it's a bit more
complicated).
For the record, the handling of counter columns amounts to the initial
proposition of Stu of moving the cleanup to the reads (though the solution is
slightly different). So the "we'll cleanup on each read before the sstable is
compacted" remark does hold here, but I don't see a better solution right now
and the "those sstables will likely be compacted quickly" argument probably
make this ok anyway.
Other comments:
* we need to use Integer.MIN_VALUE as the value for expireBefore when
deserializing the columns, otherwise the expired columns will be converted to
DeletedColumns, which will change there serialized size (and thus screw up the
data size and column index)
* for markDeltaAsDeleted, we must check if the length is already negative and
leave it so if it is, otherwise if a streamed sstable get re-streamed to
another node before it was compacted, we could end up not cleaning the delta
correctly.
* it would be nice in SSTW.appendFromStream() to assert the sanity of our
little deserialize-reserialize dance and assert what we did write the number of
bytes that we wrote in the header.
* the patch change a clearAllDelta to a markDeltaAsDeleted in CounterColumnTest
which is bogus (and the test does fail with that change).
* I would markDeltaAsDeleted to markForClearingDelta as this describe what the
function does better
* nitpick: there is a few space at end of lines in some comments (I know I
know, I'm picky).
> Trunk single-pass streaming doesn't handle large row correctly
> --------------------------------------------------------------
>
> Key: CASSANDRA-3003
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0
> Reporter: Sylvain Lebresne
> Assignee: Yuki Morishita
> Priority: Critical
> Labels: streaming
> Fix For: 1.0
>
> Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, v3003-v4.txt
>
>
> For normal column family, trunk streaming always buffer the whole row into
> memory. In uses
> {noformat}
> ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
> {noformat}
> on the input bytes.
> We must avoid this for rows that don't fit in the inMemoryLimit.
> Note that for regular column families, for a given row, there is actually no
> need to even recreate the bloom filter of column index, nor to deserialize
> the columns. It is enough to filter the key and row size to feed the index
> writer, but then simply dump the rest on disk directly. This would make
> streaming more efficient, avoid a lot of object creation and avoid the
> pitfall of big rows.
> Counters column family are unfortunately trickier, because each column needs
> to be deserialized (to mark them as 'fromRemote'). However, we don't need to
> do the double pass of LazilyCompactedRow for that. We can simply use a
> SSTableIdentityIterator and deserialize/reserialize input as it comes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira