[ 
https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170295#comment-15170295
 ] 

Paulo Motta commented on CASSANDRA-10990:
-----------------------------------------

Thanks for the review, [~yukim]! Follow-up below:

bq. StreamDeserializer determines tmp directory to where disk can be hold 2 * 
the size of streaming SSTable, but isn't it too much? SSTable can be few 
hundred GB. You need to cap that.

With the current implementation, each partition larger than 1MB is spilled to 
disk for rewind later. In the worst case, if all partitions are >> 1MB, the 
spill file will be proportional to the sstable size. This is mostly a safety 
measure to check that currently the disk has space to hold the actual sstable, 
and all partitions spilled to disk in the worst case. Of course that even with 
this, and other compactions/streaming in place you cannot guarantee anything, 
and the disk can occasionaly get full and chaos will happen. An alternative 
approach is to overwrite the spill file for each partition, so we need disk 
space proportional to the max partition size, but this will probably cause 
write amplification. In any case, before I was not checking in case the 
returned spill file from 
{{cfs.getDirectories().getTemporaryWriteableDirectoryAsFile}} was null (meaning 
there is no space) so I added that check and threw an exception.

bq. Speaking of capping the size of tmp buffer file, we may want to respect the 
parameter of InputStream#mark(int readLimit). We just set read limit to 2.147GB 
and make it the limitation of this functionality. Users can always 
upgradesstable if needed.

I don't see why we should cap this if there is enough disk space, which is 
usually the case when bootstrapping/sstableloading. While I'm happy to add the 
cap if you feel strong about it, I think this will render the functionality 
useless to dense nodes with {{STCS}}, while they could still be served as long 
as there is sufficient disk space. Furthermore, I think for most cases 
partitions will be <= 1MB, so we will only spill to disk very rarely (and the 
disk space check protection should be sufficient for other cases).

bq. RewindableInputStream only deletes buffer file at close, but it is not 
closed when stream finishes. You cannot close with RewindableInputStream#close 
because that will cause underlying socket to be closed.

Nice catch, thanks! I added a {{cleanup}} method to {{StreamDeserializer}} that 
performs the cleanup in the {{finally}} clause of {{StreamReader.read}} and 
{{CompressedStreamReader.read}} (unfortunately I couldn't reuse the {{close}} 
method of {{StreamDeserializer}} because it's called in-between-partitions by 
the try-with-resources clause of {{BigTableWriter.append}}). I also updated 
[dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/10990] to check 
that temp spill files are cleaned up correctly.

bq. RewindableInputStream - I don't think we need SyncUtil.sync on buffer file.

Removed, thanks!

bq.    RewindableDataInputStreamPlus - bytesPastMark returns underlying 
stream's available(), but I don't think that is suitable there. available (by 
contract) just returns estimated bytes to be read without blocking. You should 
return pos from RewindableInputStream. That said, I think it is better to 
integrate RewindableDataInputStreamPlus and RewindableInputStream.

Fixed {{bytesPastMark}} and integrated {{RewindableInputStream}} into 
{{RewindableDataInputStreamPlus}}.

bq. Initial capacity seems to be set as 128 bytes, not 128 Kbytes. Also I 
prefer passing this value to constructor as well so you don't have to deal with 
System.setProperty in unit test.

I set the initial capacity to 32KB, do you think this is sufficient or do you 
prefer 128KB ?

I added the initial capacity to constructor so we can use it on unit tests, but 
left the system properties so we can set them to a lower value on 
[dtests|https://github.com/pauloricardomg/cassandra-dtest/tree/10990].

Updated above branch with review followup and resubmitted tests.

> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables 
> (CASSANDRA-5772).  In 3.0, because of the rewrite of the storage layer, this 
> became no longer supported.  So currently, while 3.0 can read sstables in the 
> 2.1/2.2 format, it cannot stream the older versioned sstables.  We should do 
> some work to make this still possible to be consistent with what 
> CASSANDRA-5772 provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to