Benedict Elliott Smith created CASSANDRA-20069:
--------------------------------------------------

             Summary: Improve Accord Work Queueing (and misc perf improvements)
                 Key: CASSANDRA-20069
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20069
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benedict Elliott Smith


Work can now be cancelled, and commands serving remote requests now cancel 
queued work if they time out. This prevents run-away work growth, as work does 
not outlive its useful lifespan.

CommandStores, threads and caches are now only loosely coupled, with it 
possible to independently tune:
 * the total number of threads
 * the number of work queue/cache units we distribute the threads amongst
 * the number of CommandStores per table/shard (which are distributed between 
queue/cache units, the threads of which will execute CommandStore work)

Mutual exclusivity is managed separately for the queue/cache unit and each 
CommandStore, and the locks are held only for as long as necessary so we can 
have multiple threads servicing the same CommandStore(s). Given this threading 
model, it is now possible for Accord threads to perform all of the 
loading/saving work, reducing the queueing delay and context switching costs - 
and this is also configurable. The default configuration is now to do this work 
on the Accord work pool (as already the case with Paxos for most IO). Accord 
state reads can be scheduled and completed from any thread, so that we do not 
incur multiple queue delays when preparing work for a command store. There are 
further improvements that can be made here to permit the event loop to answer 
in-cache queries, and under a future async-io model to directly submit read 
requests.

Misc perf improvements:
 * Write directly to Memtable for cache evictions, using putIfAbsent immediately
 * Use LCS on CommandsForKey table for faster reads
 * Flatten UUID fields into TableId to reduce indirection for comparisons
 * Introduce asymmetric comparisons to BTreeMap for faster schema lookups
 * Read TimestampsForKey directly to avoid parsing CQL
 * Save summary information in RedundantBefore to short-circuit executions
 * Send Stable message only as necessary on Execute, to reduce load on replicas
 * Ensure journal entries are immediately visible to replay without handing 
over to another thread, so on normal path can avoid context switch and listener 
overheads
 * Journal periodic mode should fsync only as necessary
 * Use OpOrder to guard Journal Segment read access (avoiding taking individual 
references, which can be costly)
 * AccordCache can “shrink” (serialise) entries instead of evicting, to 
increase effective capacity (evicting any already-shrunk entries that are 
encountered)
 * EphemeralRead cache entries are evicted only on timeout of the remote 
request, and are not otherwise persisted
 * Work is scheduled by first arrival time, not by read completion time - work 
that is slow to read jumps the queue once data is in memory to serve it, to 
reduce latency variability
 * Some operations may now partially execute without waiting for all state to 
be brought into memory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to