[
https://issues.apache.org/jira/browse/CASSANDRA-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksey Yeschenko updated CASSANDRA-10989:
------------------------------------------
Labels: performance (was: )
> Move away from SEDA to TPC
> --------------------------
>
> Key: CASSANDRA-10989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10989
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Aleksey Yeschenko
> Labels: performance
>
> Since its inception, Cassandra has been utilising [SEDA
> |http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf] at its core.
> As originally conceived, it means every request is split into several stages,
> and each stage is backed by a thread pool. That imposes certain challenges:
> - thread parking/unparking overheads (partially improved by SEPExecutor in
> CASSANDRA-4718)
> - extensive context switching (i-/d- caches thrashing)
> - less than optimal multiple writer/multiple reader data structures for
> memtables, partitions, metrics, more
> - hard to grok concurrent code
> - large number of GC roots, longer TTSP
> - increased complexity for moving data structures off java heap
> - inability to easily balance writes/reads/compaction/flushing
> Latency implications of SEDA have been acknowledged by the authors themselves
> - see 2010 [retrospective on
> SEDA|http://matt-welsh.blogspot.co.uk/2010/07/retrospective-on-seda.html].
> To fix these issues (and more), two years ago at NGCC [~benedict] suggested
> moving Cassandra away from SEDA to the more mechanically sympathetic thread
> per core architecture (TPC). See the slides from the original presentation
> [here|https://docs.google.com/presentation/d/19_U8I7mq9JKBjgPmmi6Hri3y308QEx1FmXLt-53QqEw/edit?ts=56265eb4#slide=id.g98ad32b25_1_19].
> In a nutshell, each core would become a logical shared nothing micro instance
> of Cassandra, taking over a portion of the node’s range {{*}}.
> Client connections will be assigned randomly to one of the cores (sharing a
> single listen socket). A request that cannot be served by the client’s core
> will be proxied to the one owning the data, similar to the way we perform
> remote coordination today.
> Each thread (pinned to an exclusive core) would have a single event loop, and
> be responsible for both serving requests and performing maintenance tasks
> (flushing, compaction, repair), scheduling them intelligently.
> One notable exception from the original proposal is that we cannot,
> unfortunately, use linux AIO for file I/O, as it's only properly implemented
> for xfs. We might, however, have a specialised implementation for xfs and
> Windows (based on IOCP) later. In the meantime, we have no other choice other
> than to hand off I/O that cannot be served from cache to a separate
> threadpool.
> Transitioning from SEDA to TPC will be done in stages, incrementally and in
> parallel.
> This is a high-level overview meta-ticket that will track JIRA issues for
> each individual stage.
> {{*}} they’ll share certain things still, like schema, gossip, file I/O
> threadpool(s), and maybe MessagingService.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)