[
https://issues.apache.org/jira/browse/CASSANDRA-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791999#action_12791999
]
Jonathan Ellis commented on CASSANDRA-579:
------------------------------------------
What you want to be careful about here is not screwing up the
FileChannel.transferTo optimization which is very valuable.
So IMO what you want to do is:
1. use SSTR.getPosition to find start and end ranges to transfer, then use the
existing streaming API -- which already support streaming only _parts_ of files
via transferTo -- to send that over as the data file in question.
2. from the data file, compute index + BF files on the destination node,
instead of wasting IO streaming those from the source.
- streaming the index from source is possible, but since you need to scan the
data file anyway to build BF (since there is no way to extract a subset of a
BF) I think it's going to be simpler to just rebuild both. And anyway goffinet
has wanted a "rebuild index from data file" for a while now :)
> Add support to io.Streaming API for sending Streams
> ---------------------------------------------------
>
> Key: CASSANDRA-579
> URL: https://issues.apache.org/jira/browse/CASSANDRA-579
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Stu Hood
> Fix For: 0.9
>
>
> The io.Streaming API currently requires a file on disk to stream, which means
> that bootstrap and repairs need to perform an anti-compaction that writes a
> bunch of data to disk, only to have it be deleted after the streaming has
> finished.
> Ideally, the Streaming API should allow for streaming from an InputStream (or
> any other class we think we need to design to make the streaming as efficient
> as possible). That way, anti-compaction for repair/bootstrap does not perform
> any writing: it simply streams the relevant portion of the file to the
> neighbor.
> Additionally, this opens up interesting possibilities, such as providing the
> Streaming API as a (Java only?) client API. One use case would be for a
> Hadoop OutputFormat: rather than writing BinaryMemtables, the OutputFormat
> could literally write an SSTable to the stream. This might require better
> integration with gossip, to ensure that you aren't writing to the completely
> wrong node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.