Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "WritePathForUsers" page has been changed by MichaelEdge: https://wiki.apache.org/cassandra/WritePathForUsers?action=diff&rev1=18&rev2=19 == The Local Coordinator == The local coordinator receives the write request from the client and performs the following: - 1. The local coordinator determines which nodes are responsible for storing the data: + 1. The local coordinator determines which nodes are responsible for storing the data: - * The first replica is chosen based on the Partitioner hashing the primary key + * The first replica is chosen based on the Partitioner hashing the primary key - * Other replicas are chosen based on replication strategy defined for the keyspace + * Other replicas are chosen based on replication strategy defined for the keyspace - 2. The write request is then sent to all replica nodes simultaneously. + 1. The write request is then sent to all replica nodes simultaneously. - 3. The total number of nodes receiving the write request is determined by the replication factor for the keyspace. + 1. The total number of nodes receiving the write request is determined by the replication factor for the keyspace. - == Replica Nodes == - Replica nodes receive the write request from the local coordinator and perform the following: - 1. Write data to the Commit Log. This is a sequential, memory-mapped log file, on disk, that can be used to rebuild MemTables if a crash occurs before the MemTable is flushed to disk. - 2. Write data to the MemTable. MemTables are mutable, in-memory tables that are read/write. Each physical table on each replica node has an associated MemTable. - 3. If the write request is a DELETE operation (whether a delete of a column or a row), a tombstone marker is written to the Commit Log and MemTable to indicate the delete. - 4. If row caching is used, invalidate the cache for that row. Row cache is populated on read only, so it must be invalidated when data for that row is written. - 5. Acknowledge the write request back to the local coordinator. - The local coordinator waits for the appropriate number of acknowledgements (dependent on the consistency level for this write request) before acknowledging back to the client. - Flushing MemTables - MemTables are flushed to disk based on various factors, some of which include: - • commitlog_total_space_in_mb is exceeded - • memtable_total_space_in_mb is exceeded - • ‘Nodetool flush’ command is executed - • Etc. - Each flush of a MemTable results in one new, immutable SSTable on disk. After the flush an SSTable (Sorted String Table) is read-only. As with the write to the Commit Log, the write to the SSTable data file is a sequential write operation. An SSTable consists of multiple files, including the following: - • Bloom Filter - • Index - • Compression File (optional) - • Statistics File - • Data File - • Summary - • TOC.txt - Each MemTable flush executes the following steps: - 1. Sort the MemTable columns by row key - 2. Write the Bloom Filter - 3. Write the Index - 4. Serialise and write the data to the SSTable Data File - 5. Write Compression File (if compression is used) - 6. Write Statistics File - 7. Purge the written data from the Commit Log - Unavailable Replica Nodes and Hinted Handoff - When a local coordinator is unable to send data to a replica node due to the replica node being unavailable, the local coordinator stores the data in its local system.hints table; this process is known as Hinted Handoff. The data is stored for a default period of 3 hours. When the replica node comes back online the coordinator node will send the data to the replica node. - Write Path Advantages - • The write path is one of Cassandra’s key strengths: for each write request one sequential disk write plus one in-memory write occur, both of which are extremely fast. - • During a write operation, Cassandra never reads before writing, never rewrites data, never deletes data and never performs random I/O. -
