[jira] Commented: (CASSANDRA-579) Add support to io.Streaming API for sending Streams

Jonathan Ellis (JIRA) Thu, 17 Dec 2009 08:29:42 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791999#action_12791999
 ]


Jonathan Ellis commented on CASSANDRA-579:
------------------------------------------

What you want to be careful about here is not screwing up the 
FileChannel.transferTo optimization which is very valuable.

So IMO what you want to do is:

1. use SSTR.getPosition to find start and end ranges to transfer, then use the 
existing streaming API -- which already support streaming only _parts_ of files 
via transferTo -- to send that over as the data file in question.

2. from the data file, compute index + BF files on the destination node, 
instead of wasting IO streaming those from the source.
  - streaming the index from source is possible, but since you need to scan the 
data file anyway to build BF (since there is no way to extract a subset of a 
BF) I think it's going to be simpler to just rebuild both.  And anyway goffinet 
has wanted a "rebuild index from data file" for a while now :)


> Add support to io.Streaming API for sending Streams
> ---------------------------------------------------
>
>                 Key: CASSANDRA-579
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-579
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>             Fix For: 0.9
>
>
> The io.Streaming API currently requires a file on disk to stream, which means 
> that bootstrap and repairs need to perform an anti-compaction that writes a 
> bunch of data to disk, only to have it be deleted after the streaming has 
> finished.
> Ideally, the Streaming API should allow for streaming from an InputStream (or 
> any other class we think we need to design to make the streaming as efficient 
> as possible). That way, anti-compaction for repair/bootstrap does not perform 
> any writing: it simply streams the relevant portion of the file to the 
> neighbor.
> Additionally, this opens up interesting possibilities, such as providing the 
> Streaming API as a (Java only?) client API. One use case would be for a 
> Hadoop OutputFormat: rather than writing BinaryMemtables, the OutputFormat 
> could literally write an SSTable to the stream. This might require better 
> integration with gossip, to ensure that you aren't writing to the completely 
> wrong node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-579) Add support to io.Streaming API for sending Streams

Reply via email to