Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: move Streaming to its own page, and link it in 
Boostrap and Move sections..
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=39&rev2=40

--------------------------------------------------

   1. As a safety measure, Cassandra does not automatically remove data from 
nodes that "lose" part of their Token Range to a newly added node.  Run 
"nodetool cleanup" on the source node(s) (neighboring nodes that shared the 
same subrange) when you are satisfied the new node is up and working. If you do 
not do this the old data will still be counted against the load on that node 
and future bootstrap attempts at choosing a location will be thrown off.
   1. When bootstrapping a new node, existing nodes have to divide the key 
space before beginning replication.  This can take awhile, so be patient.
   1. During bootstrap, a node will drop the Thrift port and will not be 
accessible from `nodetool`.
+  1. Bootstrap can take many hours when a lot of data is involved.  See 
[[Streaming]] for how to monitor progress.
  
  Cassandra is smart enough to transfer data from the nearest source node(s), 
if your !EndpointSnitch is configured correctly.  So, the new node doesn't need 
to be in the same datacenter as the primary replica for the Range it is 
bootstrapping into, as long as another replica is in the datacenter with the 
new one.
  
@@ -79, +80 @@

  
  === Moving nodes ===
  `nodetool move`: move the target node to to a given Token. Moving is 
essentially a convenience over decommission + bootstrap.
+ 
+ As with bootstrap, see [[Streaming]] for how to monitor progress.
  
  === Load balancing ===
  `nodetool loadbalance`: also essentially a convenience over decommission + 
bootstrap, only instead of telling the target node where to move on the ring it 
will choose its location based on the same heuristic as Token selection on 
bootstrap.
@@ -177, +180 @@

  FLUSH-WRITER-POOL                 0         0            218
  HINTED-HANDOFF-POOL               0         0            154
  }}}
- === Streaming ===
- Monitoring the status of streaming on both origination and destination nodes 
can be found under the `org.apache.cassandra.streaming.StreamingService` MBean.
  
- The `Status` attribute gives an easy indication of what a node is doing with 
respect to streaming.  During the bulk of a transfer the sending node will 
report a status of `"Waiting for transfer to $some_node to complete."`  The 
receiving node will report `"Receiving stream"` while receiving stream data.  
The `StreamDestinations` and `StreamSources` attributes each contain a list of 
hosts that the current node is either sending stream data to or receiving it 
from.
- 
- The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each 
return a list of strings describing the status of individual files being 
streamed to and from a given host.  Each string follows this format:  `[path to 
file] [bytes sent/received]/[file size]` If you think that streaming is taking 
too long on your cluster, the first thing you should do is check 
`StreamSources` or `StreamDestinations` to figure out which hosts are streaming 
files.  Use those hosts as inputs to `getOutgoingFiles()` or 
`getIncomingFiles()` to check on the status of individual files from the 
problematic source and destination nodes.  Streaming is conducted in 32MB 
chunks, so you should refresh the file status after a few seconds to see if the 
sent/received values change.  If they do not change, or change more slowly than 
you'd like, something is wrong.  Keep in mind that a source node can only 
stream a single file at a time, but a destination node can simultaneously 
receive several files.
- 

Reply via email to