[Cassandra Wiki] Update of "ArchitectureInternals" by TylerHobbs

Apache Wiki Thu, 20 Aug 2015 08:10:14 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "ArchitectureInternals" page has been changed by TylerHobbs:
https://wiki.apache.org/cassandra/ArchitectureInternals?action=diff&rev1=33&rev2=34

Comment:
Fix description of how batchlog nodes are chosen

    * If nodes are changing position on the ring, "pending ranges" are 
associated with their destinations in !TokenMetadata and these are also written 
to.
    * ConsistencyLevel determines how many replies to wait for.  See 
!WriteResponseHandler.determineBlockFor.  Interaction with pending ranges is a 
bit tricky; see https://issues.apache.org/jira/browse/CASSANDRA-833
    * If the FailureDetector says that we don't have enough nodes alive to 
satisfy the ConsistencyLevel, we fail the request with !UnavailableException
-   * When performing atomic batches, the mutations are written to the batchlog 
on the two closest nodes in the local datacenter that are alive. If only one 
other node is alive, it alone will be used, but if no other nodes are alive, an 
UnavailableException will be returned.  If the cluster has only one node, it 
will write the batchlog entry itself.  The batchlog is contained in the 
system.batchlog table.
+   * When performing atomic batches, the mutations are written to the batchlog 
on two live nodes in the local datacenter. If the local datacenter contains 
multiple racks, the nodes will be chosen from two separate racks that are 
different from the coordinator's rack, when possible.  If only one other node 
is alive, it alone will be used, but if no other nodes are alive, an 
UnavailableException will be returned unless the consistency level is ANY.  If 
the cluster has only one node, it will write the batchlog entry itself.  The 
batchlog is contained in the system.batchlog table.
    * If the FD gives us the okay but writes time out anyway because of a 
failure after the request is sent or because of an overload scenario, 
!StorageProxy will write a "hint" locally to replay the write when the 
replica(s) timing out recover.  This is called HintedHandoff.  Note that HH 
does not prevent inconsistency entirely; either unclean shutdown or hardware 
failure can prevent the coordinating node from writing or replaying the hint. 
ArchitectureAntiEntropy is responsible for restoring consistency more 
completely.
    * Cross-datacenter writes are not sent directly to each replica; instead, 
they are sent to a single replica with a parameter in !MessageOut telling that 
replica to forward to the other replicas in that datacenter; those replicas 
will respond diectly to the original coordinator.
   * On the destination node, !RowMutationVerbHandler calls RowMutation.apply() 
(which calls Keyspace.apply()) to make the mutation.  This has several steps.  
First, an entry is appended to the CommitLog (potentially blocking if the 
CommitLog is in batch sync mode or if the queue is full for periodic sync 
mode.) Next, the Memtable, secondary indexes (if applicable), and row cache are 
updated (sequentially) for each ColumnFamily in the mutation.

[Cassandra Wiki] Update of "ArchitectureInternals" by TylerHobbs

Reply via email to