[Cassandra Wiki] Update of "ReadPathForUsers" by MichaelEdge

Apache Wiki Mon, 30 Nov 2015 23:29:30 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "ReadPathForUsers" page has been changed by MichaelEdge:
https://wiki.apache.org/cassandra/ReadPathForUsers?action=diff&rev1=1&rev2=2

  
  == The Local Coordinator ==
  The local coordinator receives the read request from the client and performs 
the following:
-  1. The local coordinator determines which node is responsible for storing 
the data; it does this using the Partitioner to hash the partition key
- 1. The local coordinator sends the read request to the replica node storing 
the data
+  1. The local coordinator determines which nodes are responsible for storing 
the data:
+   * The first replica is chosen based on the Partitioner hashing the primary 
key
+   * Other replicas are chosen based on replication strategy defined for the 
keyspace
+  1. The local coordinator sends a read request to the fastest replica. All 
Cassandra snitches utilise the dynamic snitch to monitor read latency between 
nodes and maintain a list of the fastest responding replicas (or more 
accurately, the snitch calculates and maintains a ‘badness score’ per node, and 
read requests are routed to nodes with the lowest ‘badness’ score).
  == Replica Node ==
- The replica node receives the read request from the local coordinator and 
performs the following:
+ The replica node receives the read request from the local coordinator and 
checks whether the requested row exists in the row cache. The row cache is a 
read-through cache and will only contain the requested data if it was 
previously read.
  === Requested data in row cache ===
-  1. Check if the row cache contains the requested row. The row cache is a 
read-through cache and will only contain the requested data if it was 
previously read.
   1. If the row is in the row cache, return the data to the local coordinator. 
Since the row cache already contains fully merged data there is no need to 
check anywhere else for the data and the read request can now be considered 
complete.
  === Requested data not in row cache ===
  The data must now be read from the SSTables and MemTable:
@@ -34, +35 @@

  It might seem an unnecessary overhead to read data from both MemTable and 
SSTables if the data for a partition key exists in the MemTable. After all, 
doesn’t the MemTable contain the latest copy of the data for a partition key?
  It depends on the type of operation applied to the data. Consider the 
following examples:
   1. If a new row is inserted, this new row will exist in the MemTable and 
will only exist in an SSTable when the MemTable is flushed. Prior to the flush 
the bloom filters will indicate that the row does not exist in any SSTables, 
and the row will therefore be read from the MemTable only (assuming it has not 
previously been read and exists in the row cache).
-   1. After the MemTable containing the new row is flushed to disk, the data 
for this partition key exists in an SSTable only, and no longer exists in the 
MemTable. The flushed MemTable has been replaced by a new, empty MemTable.
+  1. After the MemTable containing the new row is flushed to disk, the data 
for this partition key exists in an SSTable only, and no longer exists in the 
MemTable. The flushed MemTable has been replaced by a new, empty MemTable.
-   1. If the new row is now updated, the updated cells will exist in the 
MemTable. However, not every cell will exist in the MemTable, only those that 
were updated (the MemTable is never populated with the current values from the 
SSTables). To obtain a complete view of the data for this partition key both 
the MemTable and SSTable must be read.
+  1. If the new row is now updated, the updated cells will exist in the 
MemTable. However, not every cell will exist in the MemTable, only those that 
were updated (the MemTable is never populated with the current values from the 
SSTables). To obtain a complete view of the data for this partition key both 
the MemTable and SSTable must be read.
  A MemTable is therefore a write-back cache that temporarily stores a copy of 
data by partition key, prior to that data being flushed to durable storage in 
the form of SSTables on disk. Unlike a true write-back cache (where changed 
data exists in the cache only), the changed data is also made durable in a 
Commit Log so the MemTable can be rebuilt in the event of a failure.
  == Consistency Level and the Read Path ==
  Cassandra’s tunable consistency applies to read requests as well as writes. 
The consistency level determines the number of replica nodes that must respond 
before the results of a read request can be sent back to the client; by tuning 
the consistency level a user can determine whether a read request should return 
fully consistent data, or whether stale, eventually consistent data is 
acceptable. The diagram and explanation below describe how Cassandra responds 
to read requests where the consistency level is greater than ONE. For clarity 
this section does not consider multi data-centre read requests.
  
  {{attachment:CassandraReadConsistencyLevel.png|Cassandra Read Consistency 
Level|width=800}}
  
-   1. The local coordinator determines which nodes are responsible for storing 
the data:
+  1. The local coordinator determines which nodes are responsible for storing 
the data:
    * The first replica is chosen based on the Partitioner hashing the primary 
key
    * Other replicas are chosen based on replication strategy defined for the 
keyspace
-  1. The local coordinator sends a read request to the fastest replica. All 
Cassandra snitches utilise the dynamic snitch to monitor read latency between 
nodes and maintain a list of the fastest responding replicas (or more 
accurately, the snitch calculates and maintains a ‘badness score’ per node, and 
read requests are routed to nodes with the lowest ‘badness’ score).
+  1. The local coordinator sends a read request to the fastest replica. 
-   1. The fastest replica performs a read according to the ‘read path’ 
described above.
+  1. The fastest replica performs a read according to the ‘read path’ 
described above.
-   1. The local coordinator sends a ‘digest’ read request to the other replica 
nodes; these nodes calculate a hash of the data being requested and returns 
this to the local coordinator.
+  1. The local coordinator sends a ‘digest’ read request to the other replica 
nodes; these nodes calculate a hash of the data being requested and returns 
this to the local coordinator.
-   1. The local coordinator compares the hashes from all replica nodes. If 
they match it indicates that all nodes contain exactly the same version of the 
data; in this case the data from the fastest replica is returned to the client.
+  1. The local coordinator compares the hashes from all replica nodes. If they 
match it indicates that all nodes contain exactly the same version of the data; 
in this case the data from the fastest replica is returned to the client.
-   1. If the digests do not match then a conflict resolution process is 
necessary:
+  1. If the digests do not match then a conflict resolution process is 
necessary:
    * Read data from all replica nodes (with the exception of the fastest 
replica, as this has already responded to a full read request) according to the 
‘read path’ described above.
    * Merge the data cell by cell based on timestamp. For each cell, the data 
from all replicas is checked and the most recent timestamp wins.
    * Return the merged data to the client.

[Cassandra Wiki] Update of "ReadPathForUsers" by MichaelEdge

Reply via email to