Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "WritePathForUsers" page has been changed by MichaelEdge: https://wiki.apache.org/cassandra/WritePathForUsers?action=diff&rev1=10&rev2=11 {{attachment:CassandraWritePath.png|text describing image|width=700}} + Write Path + The Local Coordinator + The local coordinator receives the write request from the client and performs the following: + 1. The local coordinator determines which nodes are responsible for storing the data: + • The first replica is chosen based on the Partitioner hashing the primary key + • Other replicas are chosen based on replication strategy defined for the keyspace + 2. The write request is then sent to all replica nodes simultaneously. + 3. The total number of nodes receiving the write request is determined by the replication factor for the keyspace. + Replica Nodes + Replica nodes receive the write request from the local coordinator and perform the following: + 1. Write data to the Commit Log. This is a sequential, memory-mapped log file, on disk, that can be used to rebuild MemTables if a crash occurs before the MemTable is flushed to disk. + 2. Write data to the MemTable. MemTables are mutable, in-memory tables that are read/write. Each physical table on each replica node has an associated MemTable. + 3. If the write request is a DELETE operation (whether a delete of a column or a row), a tombstone marker is written to the Commit Log and MemTable to indicate the delete. + 4. If row caching is used, invalidate the cache for that row. Row cache is populated on read only, so it must be invalidated when data for that row is written. + 5. Acknowledge the write request back to the local coordinator. + The local coordinator waits for the appropriate number of acknowledgements (dependent on the consistency level for this write request) before acknowledging back to the client. + Flushing MemTables + MemTables are flushed to disk based on various factors, some of which include: + • commitlog_total_space_in_mb is exceeded + • memtable_total_space_in_mb is exceeded + • ‘Nodetool flush’ command is executed + • Etc. + Each flush of a MemTable results in one new, immutable SSTable on disk. After the flush an SSTable (Sorted String Table) is read-only. As with the write to the Commit Log, the write to the SSTable data file is a sequential write operation. An SSTable consists of multiple files, including the following: + • Bloom Filter + • Index + • Compression File (optional) + • Statistics File + • Data File + • Summary + • TOC.txt + Each MemTable flush executes the following steps: + 1. Sort the MemTable columns by row key + 2. Write the Bloom Filter + 3. Write the Index + 4. Serialise and write the data to the SSTable Data File + 5. Write Compression File (if compression is used) + 6. Write Statistics File + 7. Purge the written data from the Commit Log + Unavailable Replica Nodes and Hinted Handoff + When a local coordinator is unable to send data to a replica node due to the replica node being unavailable, the local coordinator stores the data in its local system.hints table; this process is known as Hinted Handoff. The data is stored for a default period of 3 hours. When the replica node comes back online the coordinator node will send the data to the replica node. + Write Path Advantages + • The write path is one of Cassandra’s key strengths: for each write request one sequential disk write plus one in-memory write occur, both of which are extremely fast. + • During a write operation, Cassandra never reads before writing, never rewrites data, never deletes data and never performs random I/O. + + ---- /!\ '''End of edit conflict''' ---- +