[ 
https://issues.apache.org/jira/browse/CASSANDRA-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10989:
------------------------------------------
    Labels: performance  (was: )

> Move away from SEDA to TPC
> --------------------------
>
>                 Key: CASSANDRA-10989
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10989
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>              Labels: performance
>
> Since its inception, Cassandra has been utilising [SEDA 
> |http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf] at its core.
> As originally conceived, it means every request is split into several stages, 
> and each stage is backed by a thread pool. That imposes certain challenges:
> - thread parking/unparking overheads (partially improved by SEPExecutor in 
> CASSANDRA-4718)
> - extensive context switching (i-/d- caches thrashing)
> - less than optimal multiple writer/multiple reader data structures for 
> memtables, partitions, metrics, more
> - hard to grok concurrent code
> - large number of GC roots, longer TTSP
> - increased complexity for moving data structures off java heap
> - inability to easily balance writes/reads/compaction/flushing
> Latency implications of SEDA have been acknowledged by the authors themselves 
> - see 2010 [retrospective on 
> SEDA|http://matt-welsh.blogspot.co.uk/2010/07/retrospective-on-seda.html].
> To fix these issues (and more), two years ago at NGCC [~benedict] suggested 
> moving Cassandra away from SEDA to the more mechanically sympathetic thread 
> per core architecture (TPC). See the slides from the original presentation 
> [here|https://docs.google.com/presentation/d/19_U8I7mq9JKBjgPmmi6Hri3y308QEx1FmXLt-53QqEw/edit?ts=56265eb4#slide=id.g98ad32b25_1_19].
> In a nutshell, each core would become a logical shared nothing micro instance 
> of Cassandra, taking over a portion of the node’s range {{*}}.
> Client connections will be assigned randomly to one of the cores (sharing a 
> single listen socket). A request that cannot be served by the client’s core 
> will be proxied to the one owning the data, similar to the way we perform 
> remote coordination today.
> Each thread (pinned to an exclusive core) would have a single event loop, and 
> be responsible for both serving requests and performing maintenance tasks 
> (flushing, compaction, repair), scheduling them intelligently.
> One notable exception from the original proposal is that we cannot, 
> unfortunately, use linux AIO for file I/O, as it's only properly implemented 
> for xfs. We might, however, have a specialised implementation for xfs and 
> Windows (based on IOCP) later. In the meantime, we have no other choice other 
> than to hand off I/O that cannot be served from cache to a separate 
> threadpool.
> Transitioning from SEDA to TPC will be done in stages, incrementally and in 
> parallel.
> This is a high-level overview meta-ticket that will track JIRA issues for 
> each individual stage.
> {{*}} they’ll share certain things still, like schema, gossip, file I/O 
> threadpool(s), and maybe MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to