Re: scylladb

Avi Kivity Sat, 11 Mar 2017 13:44:06 -0800

There are several issues at play here.

First, a database runs a large number of concurrent operations, each ofwhich only consumes a small amount of CPU. The high concurrency is needto hide latency: disk latency, or the latency of contacting a remotenode. This means that the scheduler will need to switch contexts veryoften. A kernel thread scheduler knows very little about theapplication, so it has to switch a lot of context. A user levelscheduler is tightly bound to the application, so it can perform theswitching faster. There are also implications on the concurrencyprimitives in use (locks etc.) -- they will be much faster for theuser-level scheduler, because they cooperate with the scheduler. Forexample, no atomic read-modify-write instructions need to be executed.

Second, how many (kernel) threads should you run? If you run too fewthreads, then you will not be able to saturate the CPU resources. Thisis a common problem with Cassandra -- it's very hard to get it toconsume all of the CPU power on even a moderately large machine. On theother hand, if you have too many threads, you will see latency rise veryquickly, because kernel scheduling granularity is on the order ofmilliseconds. User-level scheduling, because it leaves control in thehand of the application, allows you to both saturate the CPU andmaintain low latency.

There are other factors, like NUMA-friendliness, but in the end it allboils down to efficiency and control.


None of this is new btw, it's pretty common in the storage world.

Avi

On 03/11/2017 11:18 PM, Kant Kodali wrote:

Here is the Java version http://docs.paralleluniverse.co/quasar/ but Istill don't see how user level scheduling can be beneficial (This is awell debated problem)? How can this add to the performance? or say whyis user level scheduling necessary Given the Thread per core designand the callback mechanism?

On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <a...@scylladb.com<mailto:a...@scylladb.com>> wrote:


    Scylla uses a the seastar framework, which provides for both
    user-level thread scheduling and simple run-to-completion tasks.

    Huge pages are limited to 2MB (and 1GB, but these aren't available
    as transparent hugepages).


    On 03/11/2017 10:26 PM, Kant Kodali wrote:

    @Dor

    1) You guys have a CPU scheduler? you mean user level thread
    Scheduler that maps user level threads to kernel level threads? I
    thought C++ by default creates native kernel threads but sure
    nothing will stop someone to create a user level scheduling
    library if that's what you are talking about?
    2) How can one create THP of size 1KB? According to this post
    
<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html>
 it
    looks like the valid values 2MB and 1GB.

    Thanks,
    kant

    On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <a...@scylladb.com
    <mailto:a...@scylladb.com>> wrote:

        Agreed, I'd recommend to treat benchmarks as a rough guide to
        see where there is potential, and follow through with your
        own tests.

        On 03/11/2017 09:37 PM, Edward Capriolo wrote:


        Benchmarks are great for FUDly blog posts. Real world work
        loads matter more. Every NoSQL vendor wins their benchmarks.

Re: scylladb

Reply via email to