Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Streaming" page has been changed by JonathanEllis.
The comment on this change is: describe streaming steps + anticompaction.
http://wiki.apache.org/cassandra/Streaming

--------------------------------------------------

New page:
When data needs to be moved from one node (the source) to another (the 
destination), the following steps occur:

 1. The destination sends a request to the source with the data ranges it 
desires
 1. The source copies the data in those ranges to sstable files in preparation 
for streaming.  This is called anti-compaction (because compaction merges 
multiple sstable files into one, and this does the opposite).
 1. The source sends the list of files to be streamed to the destination, 
followed by the data

Monitoring the status of streaming on both source and destination nodes can be 
found under the `org.apache.cassandra.streaming.StreamingService` MBean.  The 
`Status` attribute gives an easy indication of what a node is doing with 
respect to streaming.

Step 2 is what takes the most time on most systems. The destination will be 
idle during this stage; to monitor anticompaction progress,  you should check 
the `Compaction` mbean on the source.

Once step 3 begins actual data transfer, the sending node will report a status 
of `"Waiting for transfer to $some_node to complete."`  The receiving node will 
report `"Receiving stream"` while receiving stream data.  The 
`StreamDestinations` and `StreamSources` attributes each contain a list of 
hosts that the current node is either sending stream data to or receiving it 
from.

The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each 
return a list of strings describing the status of individual files being 
streamed to and from a given host.  Each string follows this format:  `[path to 
file] [bytes sent/received]/[file size]` If you think that streaming is taking 
too long on your cluster, the first thing you should do is check 
`StreamSources` or `StreamDestinations` to figure out which hosts are streaming 
files.  Use those hosts as inputs to `getOutgoingFiles()` or 
`getIncomingFiles()` to check on the status of individual files from the 
problematic source and destination nodes.  Streaming is conducted in 32MB 
chunks, so you should refresh the file status after a few seconds to see if the 
sent/received values change.  If they do not change, or change more slowly than 
you'd like, something is wrong.  Keep in mind that a source node can only 
stream a single file at a time, but a destination node can simultaneously 
receive several files.

Reply via email to