Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali  wrote:

> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity  wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>

In general, yes but in practice it's more complicated.
Each such thread runs different tasks, you need a mechanism to switch
between these
tasks, this is the seastar continuation engine in our case. However, things
get more
complicated. We found that we need a cpu scheduler which takes into account
the priority
of different tasks, such as repair, compaction, streaming, read operations
and write operations.
We always prioritize foreground operations over background ones and thus
even when we
repair TBs of data, latency is still very low (this feature is coming in
Scylla 1.8)



>
>
>> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's *
>

Correct. That's why we let the OS scheduler to run just one thread per core
and we bind the thread to the cpu. Inside, we do our own stuff with the
seastar scheduler and the OS doesn't know and doesn't care.

More below


> *that support scheduler activation(LWP's) and upcall mechanism. Even then
> it is hard to say if it is all worth it (The research shows performance may
> not outweigh the complexity). Golang problem is exactly this if one creates
> 1000 go routines/green threads where each of them is making a blocking
> system call then it would create 1000 kernel threads underneath because it
> has no way to know that the kernel thread is blocked (no upcall). And in
> non-blocking case I still don't even see a significant performance when
> compared to few kernel threads with callback mechanism.  If you are saying
> user level scheduling is the Future (perhaps I would just let the
> researchers argue about it) As of today that is not case else languages
> would have had it natively instead of using third party frameworks or
> libraries. *
>

That's why we do not run blocking system calls at all. We had to limit
ourselves to the XFS filesystem
only since the others did have got AIO support. Recently we bypassed some
of the issues which
made EXT4 to block and it may be ok with our AIO pattern.

We even write a DNS implementation that doesn't block and doesn't lock (for
us, even a library that uses spin locks under the hood is bad).

Bare in mind that the whole thing is simple to run and the user doesn't
need to know anything of this complexity.




>
>
>> There are also implications on the concurrency primitives in use (locks
>> etc.) -- they will be much faster for the user-level scheduler, because
>> they cooperate with the scheduler.  For example, no atomic
>> read-modify-write instructions need to be executed.
>>
>
>
>  Second, how many (kernel) threads should you run?* This question one
> will always have. If there are 10K user level threads that maps to only one
> kernel thread then they cannot exploit parallelism. so there is no right
> answer but a thread per core is a reasonable/good choice. *
>

+1


>
>
>> If you run too few threads, then you will not be able to saturate the CPU
>> resources.  This is a common problem with Cassandra -- it's very hard to
>> get it to consume all of the CPU power on even a moderately large machine.
>> On the other hand, if you have too many threads, you will see latency rise
>> very quickly, because kernel scheduling granularity is on the order of
>> milliseconds.  User-level scheduling, because it leaves control in the hand
>> of the application, allows you to both saturate the CPU and maintain low
>> latency.
>>
>
> F*or my workload and probably others I had seen Cassandra was always
> been CPU bound.*
>

Could be. However, try to make it CPU bound on 10 core, 20 core and more.
The more core you use, the less nodes you need and the overall overhead
decreases.


>
>> There are other factors, like NUMA-friendliness, but in the end it all
>> boils down to efficiency and control.
>>
>> None of this is new btw, it's pretty common in the storage world.
>>
>> Avi
>>
>>
>> On 03/11/2017 11:18 PM, 

Re: scylladb

2017-03-11 Thread benjamin roth
There is no reason to be angry. This is progress. This is the circle of
live.

It happens anywhere at any time.

Am 12.03.2017 07:34 schrieb "Dor Laor" :

> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa  wrote:
>
>>
>>
>> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
>> > Cassanda vs Scylla is a valid comparison because they both are
>> compatible. Scylla is a drop-in replacement for Cassandra.
>>
>> No, they aren't, and no, it isn't
>>
>
> Jeff is angry with us for some reason. I don't know why, it's natural that
> when
> a new opponent there are objections and the proof lies on us.
> We go through great deal of doing it and we don't just throw comments
> without backing.
>
> Scylla IS a drop in replacement for C*. We support the same CQL (from
> version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
> 2.1.8). In 1.7 release we support cql uploader
> from 3.x. We will support the SStable format of 3.x natively in 3 month
> time. Soon all of the feature set will be implemented. We always have been
> using this page (not 100% up to date, we'll update it this week):
> http://www.scylladb.com/technology/status/
>
> We add a jmx-proxy daemon in java in order to make the transition as
> smooth as possible. Almost all the nodetool commands just work, for sure
> all the important ones.
> Btw: we have a RESTapi and Prometheus formats, much better than the hairy
> jmx one.
>
> Spark, Kairosdb, Presto and probably Titan (we add Thrift just for legacy
> users and we don't intend
> to decommission an api).
>
> Regarding benchmarks, if someone finds a flaw in them, we'll do the best
> to fix it.
> Let's ignore them and just here what our users have to say:
> http://www.scylladb.com/users/
>
>
>


Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa  wrote:

>
>
> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
> > Cassanda vs Scylla is a valid comparison because they both are
> compatible. Scylla is a drop-in replacement for Cassandra.
>
> No, they aren't, and no, it isn't
>

Jeff is angry with us for some reason. I don't know why, it's natural that
when
a new opponent there are objections and the proof lies on us.
We go through great deal of doing it and we don't just throw comments
without backing.

Scylla IS a drop in replacement for C*. We support the same CQL (from
version 1.7 it's cql 3.3.1, protocol v4), the same SStable format (based on
2.1.8). In 1.7 release we support cql uploader
from 3.x. We will support the SStable format of 3.x natively in 3 month
time. Soon all of the feature set will be implemented. We always have been
using this page (not 100% up to date, we'll update it this week):
http://www.scylladb.com/technology/status/

We add a jmx-proxy daemon in java in order to make the transition as smooth
as possible. Almost all the nodetool commands just work, for sure all the
important ones.
Btw: we have a RESTapi and Prometheus formats, much better than the hairy
jmx one.

Spark, Kairosdb, Presto and probably Titan (we add Thrift just for legacy
users and we don't intend
to decommission an api).

Regarding benchmarks, if someone finds a flaw in them, we'll do the best to
fix it.
Let's ignore them and just here what our users have to say:
http://www.scylladb.com/users/


Re: scylladb

2017-03-11 Thread benjamin roth
Why?

Am 12.03.2017 07:02 schrieb "Jeff Jirsa" :

>
>
> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
> > Cassanda vs Scylla is a valid comparison because they both are
> compatible. Scylla is a drop-in replacement for Cassandra.
>
> No, they aren't, and no, it isn't
>
>
>
>
>


Re: scylladb

2017-03-11 Thread Jeff Jirsa


On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote: 
> Cassanda vs Scylla is a valid comparison because they both are compatible. 
> Scylla is a drop-in replacement for Cassandra.

No, they aren't, and no, it isn't






Re: scylladb

2017-03-11 Thread Edward Capriolo
On Sat, Mar 11, 2017 at 9:41 PM, daemeon reiydelle 
wrote:

> Recall that garbage collection on a busy node can occur minutes or seconds
> apart. Note that stop the world GC also happens as frequently as every
> couple of minutes on every node. Remove that and do the simple arithmetic.
>
>
> sent from my mobile
> Daemeon Reiydelle
> skype daemeon.c.m.reiydelle
> USA 415.501.0198 <(415)%20501-0198>
>
> On Mar 10, 2017 8:59 AM, "Bhuvan Rawal"  wrote:
>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed different from Cassandra such as getting
>>> rid of SEDA and moving to TPC and so on.
>>>
>>> If someone thinks it is because of C++ then just show the benchmarks
>>> that proves it is indeed the C++ which gave 10X performance boost as
>>> ScyllaDB claims instead of stating it.
>>>
>>>
>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>> mrbur...@gmail.com> wrote:
>>>
 They spend an enormous amount of time focusing on performance. You can
 expect them to continue on with their optimization and keep crushing it.

 P.S., I don't work for ScyllaDB.

 On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <
 rakeshkumar...@outlook.com> wrote:

> In all of their presentation they keep harping on the fact that
> scylladb is written in C++ and does not carry the overhead of Java.  Still
> the difference looks staggering.
> 
> From: daemeon reiydelle 
> Sent: Thursday, March 9, 2017 14:21
> To: user@cassandra.apache.org
> Subject: Re: scylladb
>
> The comparison is fair, and conservative. Did substantial performance
> comparisons for two clients, both results returned throughputs that were
> faster than the published comparisons (15x as I recall). At that time the
> client preferred to utilize a Cass COTS solution and use a caching 
> solution
> for OLA compliance.
>
>
> ...
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
> London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>
>
> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen > wrote:
> I was wondering how people feel about the comparison that's made here
> between Cassandra and ScyllaDB : 

Re: scylladb

2017-03-11 Thread daemeon reiydelle
Recall that garbage collection on a busy node can occur minutes or seconds
apart. Note that stop the world GC also happens as frequently as every
couple of minutes on every node. Remove that and do the simple arithmetic.


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 10, 2017 8:59 AM, "Bhuvan Rawal"  wrote:

> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III > > wrote:
>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar >> > wrote:
>>>
 In all of their presentation they keep harping on the fact that
 scylladb is written in C++ and does not carry the overhead of Java.  Still
 the difference looks staggering.
 
 From: daemeon reiydelle 
 Sent: Thursday, March 9, 2017 14:21
 To: user@cassandra.apache.org
 Subject: Re: scylladb

 The comparison is fair, and conservative. Did substantial performance
 comparisons for two clients, both results returned throughputs that were
 faster than the published comparisons (15x as I recall). At that time the
 client preferred to utilize a Cass COTS solution and use a caching solution
 for OLA compliance.


 ...

 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
 London (+44) (0) 20 8144 9872
 <%28%2B44%29%20%280%29%2020%208144%209872>

 On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen >> ro...@us2.nl>> wrote:
 I was wondering how people feel about the comparison that's made here
 between Cassandra and ScyllaDB : http://www.scylladb.com/techno
 logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
 cassandra-nodes

 They are claiming a 10x improvement, is that a fair comparison or maybe
 a somewhat coloured 

Re: scylladb

2017-03-11 Thread Kant Kodali
My response is inline.

On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity  wrote:

> There are several issues at play here.
>
> First, a database runs a large number of concurrent operations, each of
> which only consumes a small amount of CPU. The high concurrency is need to
> hide latency: disk latency, or the latency of contacting a remote node.
>

*Ok so you are talking about hiding I/O latency.  If all these I/O are
non-blocking system calls then a thread per core and callback mechanism
should suffice isn't it?*


> This means that the scheduler will need to switch contexts very often. A
> kernel thread scheduler knows very little about the application, so it has
> to switch a lot of context.  A user level scheduler is tightly bound to the
> application, so it can perform the switching faster.
>

*sure but this applies in other direction as well. A user level scheduler
has no idea about kernel level scheduler either.  There is literally no
coordination between kernel level scheduler and user level scheduler in
linux or any major OS. It may be possible with OS's that support scheduler
activation(LWP's) and upcall mechanism. Even then it is hard to say if it
is all worth it (The research shows performance may not outweigh the
complexity). Golang problem is exactly this if one creates 1000 go
routines/green threads where each of them is making a blocking system call
then it would create 1000 kernel threads underneath because it has no way
to know that the kernel thread is blocked (no upcall). And in non-blocking
case I still don't even see a significant performance when compared to few
kernel threads with callback mechanism.  If you are saying user level
scheduling is the Future (perhaps I would just let the researchers argue
about it) As of today that is not case else languages would have had it
natively instead of using third party frameworks or libraries. *


> There are also implications on the concurrency primitives in use (locks
> etc.) -- they will be much faster for the user-level scheduler, because
> they cooperate with the scheduler.  For example, no atomic
> read-modify-write instructions need to be executed.
>


 Second, how many (kernel) threads should you run?* This question one
will always have. If there are 10K user level threads that maps to only one
kernel thread then they cannot exploit parallelism. so there is no right
answer but a thread per core is a reasonable/good choice. *


> If you run too few threads, then you will not be able to saturate the CPU
> resources.  This is a common problem with Cassandra -- it's very hard to
> get it to consume all of the CPU power on even a moderately large machine.
> On the other hand, if you have too many threads, you will see latency rise
> very quickly, because kernel scheduling granularity is on the order of
> milliseconds.  User-level scheduling, because it leaves control in the hand
> of the application, allows you to both saturate the CPU and maintain low
> latency.
>

F*or my workload and probably others I had seen Cassandra was always
been CPU bound.*

>
> There are other factors, like NUMA-friendliness, but in the end it all
> boils down to efficiency and control.
>
> None of this is new btw, it's pretty common in the storage world.
>
> Avi
>
>
> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>
> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
> still don't see how user level scheduling can be beneficial (This is a well
> debated problem)? How can this add to the performance? or say why is user
> level scheduling necessary Given the Thread per core design and the
> callback mechanism?
>
> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity  wrote:
>
>> Scylla uses a the seastar framework, which provides for both user-level
>> thread scheduling and simple run-to-completion tasks.
>>
>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>> transparent hugepages).
>>
>>
>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>
>> @Dor
>>
>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>> that maps user level threads to kernel level threads? I thought C++ by
>> default creates native kernel threads but sure nothing will stop someone to
>> create a user level scheduling library if that's what you are talking about?
>> 2) How can one create THP of size 1KB? According to this post
>> 
>>  it
>> looks like the valid values 2MB and 1GB.
>>
>> Thanks,
>> kant
>>
>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity  wrote:
>>
>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>> there is potential, and follow through with your own tests.
>>>
>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>
>>>
>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>> more. Every NoSQL 

Re: Row cache tuning

2017-03-11 Thread Matija Gobec
Hi,

In 99% of use cases Cassandra's row cache is not something you should look
into. Leveraging page cache yields good results and if accounted for can
provide you with performance increase on read side.
I'm not a fan of a default row cache implementation and its invalidation
mechanism on updates so you really need to be careful when and how you use
it. There isn't much to configuration as there is to your use case. Maybe
explain what are you trying to solve with row cache and people can get into
discussion with more context.

Regards,
Matija

On Sat, Mar 11, 2017 at 9:15 PM, preetika tyagi 
wrote:

> Hi,
>
> I'm new to Cassandra and trying to get a better understanding on how the
> row cache can be tuned to optimize the performance.
>
> I came across think this article: https://docs.
> datastax.com/en/cassandra/3.0/cassandra/operations/
> opsConfiguringCaches.html
>
> And it suggests not to even touch row cache unless read workload is > 95%
> and mostly rely on machine's default cache mechanism which comes with OS.
>
> The default row cache size is 0 in cassandra.yaml file so the row cache
> won't be utilized at all.
>
> Therefore, I'm wondering how exactly I can decide to chose to tweak row
> cache if needed. Are there any good pointers one can provide on this?
>
> Thanks,
> Preetika
>


Re: scylladb

2017-03-11 Thread Avi Kivity

There are several issues at play here.

First, a database runs a large number of concurrent operations, each of 
which only consumes a small amount of CPU. The high concurrency is need 
to hide latency: disk latency, or the latency of contacting a remote 
node. This means that the scheduler will need to switch contexts very 
often. A kernel thread scheduler knows very little about the 
application, so it has to switch a lot of context.  A user level 
scheduler is tightly bound to the application, so it can perform the 
switching faster.  There are also implications on the concurrency 
primitives in use (locks etc.) -- they will be much faster for the 
user-level scheduler, because they cooperate with the scheduler.  For 
example, no atomic read-modify-write instructions need to be executed.


Second, how many (kernel) threads should you run?  If you run too few 
threads, then you will not be able to saturate the CPU resources.  This 
is a common problem with Cassandra -- it's very hard to get it to 
consume all of the CPU power on even a moderately large machine.  On the 
other hand, if you have too many threads, you will see latency rise very 
quickly, because kernel scheduling granularity is on the order of 
milliseconds. User-level scheduling, because it leaves control in the 
hand of the application, allows you to both saturate the CPU and 
maintain low latency.


There are other factors, like NUMA-friendliness, but in the end it all 
boils down to efficiency and control.


None of this is new btw, it's pretty common in the storage world.

Avi

On 03/11/2017 11:18 PM, Kant Kodali wrote:
Here is the Java version http://docs.paralleluniverse.co/quasar/ but I 
still don't see how user level scheduling can be beneficial (This is a 
well debated problem)? How can this add to the performance? or say why 
is user level scheduling necessary Given the Thread per core design 
and the callback mechanism?


On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity > wrote:


Scylla uses a the seastar framework, which provides for both
user-level thread scheduling and simple run-to-completion tasks.

Huge pages are limited to 2MB (and 1GB, but these aren't available
as transparent hugepages).


On 03/11/2017 10:26 PM, Kant Kodali wrote:

@Dor

1) You guys have a CPU scheduler? you mean user level thread
Scheduler that maps user level threads to kernel level threads? I
thought C++ by default creates native kernel threads but sure
nothing will stop someone to create a user level scheduling
library if that's what you are talking about?
2) How can one create THP of size 1KB? According to this post


 it
looks like the valid values 2MB and 1GB.

Thanks,
kant

On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity > wrote:

Agreed, I'd recommend to treat benchmarks as a rough guide to
see where there is potential, and follow through with your
own tests.

On 03/11/2017 09:37 PM, Edward Capriolo wrote:


Benchmarks are great for FUDly blog posts. Real world work
loads matter more. Every NoSQL vendor wins their benchmarks.













Re: scylladb

2017-03-11 Thread Kant Kodali
Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
still don't see how user level scheduling can be beneficial (This is a well
debated problem)? How can this add to the performance? or say why is user
level scheduling necessary Given the Thread per core design and the
callback mechanism?

On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity  wrote:

> Scylla uses a the seastar framework, which provides for both user-level
> thread scheduling and simple run-to-completion tasks.
>
> Huge pages are limited to 2MB (and 1GB, but these aren't available as
> transparent hugepages).
>
>
> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>
> @Dor
>
> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
> that maps user level threads to kernel level threads? I thought C++ by
> default creates native kernel threads but sure nothing will stop someone to
> create a user level scheduling library if that's what you are talking about?
> 2) How can one create THP of size 1KB? According to this post
> 
>  it
> looks like the valid values 2MB and 1GB.
>
> Thanks,
> kant
>
> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity  wrote:
>
>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>> there is potential, and follow through with your own tests.
>>
>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>
>>
>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>> more. Every NoSQL vendor wins their benchmarks.
>>
>>
>>
>
>
>
>


Re: scylladb

2017-03-11 Thread Avi Kivity
Scylla uses a the seastar framework, which provides for both user-level 
thread scheduling and simple run-to-completion tasks.


Huge pages are limited to 2MB (and 1GB, but these aren't available as 
transparent hugepages).


On 03/11/2017 10:26 PM, Kant Kodali wrote:

@Dor

1) You guys have a CPU scheduler? you mean user level thread Scheduler 
that maps user level threads to kernel level threads? I thought C++ by 
default creates native kernel threads but sure nothing will stop 
someone to create a user level scheduling library if that's what you 
are talking about?
2) How can one create THP of size 1KB? According to this post 
 it 
looks like the valid values 2MB and 1GB.


Thanks,
kant

On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity > wrote:


Agreed, I'd recommend to treat benchmarks as a rough guide to see
where there is potential, and follow through with your own tests.

On 03/11/2017 09:37 PM, Edward Capriolo wrote:


Benchmarks are great for FUDly blog posts. Real world work loads
matter more. Every NoSQL vendor wins their benchmarks.










Re: scylladb

2017-03-11 Thread Kant Kodali
@Dor

1) You guys have a CPU scheduler? you mean user level thread Scheduler that
maps user level threads to kernel level threads? I thought C++ by default
creates native kernel threads but sure nothing will stop someone to create
a user level scheduling library if that's what you are talking about?
2) How can one create THP of size 1KB? According to this post

it
looks like the valid values 2MB and 1GB.

Thanks,
kant

On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity  wrote:

> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
> there is potential, and follow through with your own tests.
>
> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>
>
> Benchmarks are great for FUDly blog posts. Real world work loads matter
> more. Every NoSQL vendor wins their benchmarks.
>
>
>


Row cache tuning

2017-03-11 Thread preetika tyagi
Hi,

I'm new to Cassandra and trying to get a better understanding on how the
row cache can be tuned to optimize the performance.

I came across think this article:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfiguringCaches.html

And it suggests not to even touch row cache unless read workload is > 95%
and mostly rely on machine's default cache mechanism which comes with OS.

The default row cache size is 0 in cassandra.yaml file so the row cache
won't be utilized at all.

Therefore, I'm wondering how exactly I can decide to chose to tweak row
cache if needed. Are there any good pointers one can provide on this?

Thanks,
Preetika


Re: scylladb

2017-03-11 Thread Avi Kivity
Agreed, I'd recommend to treat benchmarks as a rough guide to see where 
there is potential, and follow through with your own tests.


On 03/11/2017 09:37 PM, Edward Capriolo wrote:


Benchmarks are great for FUDly blog posts. Real world work loads 
matter more. Every NoSQL vendor wins their benchmarks.





Re: scylladb

2017-03-11 Thread Avi Kivity

Here's a test (by Samsung MSL) comparing Scylla to Cassandra 3.9:

http://www.scylladb.com/2017/02/15/scylladb-vs-cassandra-performance-benchmark-samsung/

there's a link at the end to the original report.

On 03/11/2017 09:08 PM, Bhuvan Rawal wrote:
"Lastly, why don't you test Scylla yourself?  It's pretty easy to set 
up, there's nothing to tune."
 - The details are indeed compelling to have a go ahead and test it 
for specific use case.


If it works out good it can lead to good cost cut in infra costs as 
well as having to manage less servers plus probably less time to 
bootstrap & decommission nodes!


It will also be interesting to have a benchmark with Cassandra 3 
version as well, as the new storage engine is said to have better 
performance:
https://www.datastax.com/2015/12/storage-engine-30 



Regards,
Bhuvan

On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity > wrote:


There is no magic 10X bullet.  It's a mix of multiple factors,
which can come up to less than 10X in some circumstances and more
than 10X in others, as has been reported on this thread by others.

TPC doesn't give _any_ advantage when you have just one core, and
can give more than 10X on a machine with a large number of cores. 
These are becoming more and more common, think of the recent AMD

Naples announcement; with 32 cores per socket you can have 128
logical cores in a two-socket server; or the AWS i3.16xlarge
instance with 32 cores / 64 vcpus.

You're welcome to browse our site to learn more about the
architecture, or watch this technical talk [1] I gave in QConSF
that highlights some of the techniques we use.

Of course it's possible to mistune Cassandra to give bad results,
that is why we spent a lot more time tuning Cassandra and
documenting everything than we spent on Scylla.  You can read the
report in [2], it is very detailed, and provides a wealth of
metrics like you'd expect.

I'm not going to comment about the Aerospike numbers, I haven't
studied them in detail.  And no, you can't multiply results like
that unless they were done with very similar configurations and
test harnesses.

Lastly, why don't you test Scylla yourself?  It's pretty easy to
set up, there's nothing to tune.

Avi

[1] https://www.infoq.com/presentations/scylladb

[2]
http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/





On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:

Agreed C++ gives an added advantage to talk to underlying
hardware with better efficiency, it sound good but can a pice of
code written in C++ give 1000% throughput than a Java app? Is TPC
design 10X more performant than SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is
itself written in C) claim to be 10X faster than Scylla here
http://www.aerospike.com/benchmarks/scylladb-initial/
 ?
(Combining your's and aerospike's benchmarks it appears that
Aerospike is 100X performant than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases,
one can observe 10X performance difference between a mistuned
cassandra cluster and one thats tuned as per data model - there
are so many Tunables in yaml as well as table configs.

Idea is - in order to strengthen your claim, you need to provide
complete system metrics (Disk, CPU, Network), the OPS increase
starts to decay along with the configs used. Having plain ops per
second and 99p latency is blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity > wrote:

ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly
responsible for a small fraction of the gain by executing
faster than Java.  But it is indirectly responsible for the
gain by allowing us direct control over memory and
threading.  Just as an example, Scylla starts by taking over
almost all of the machine's memory, and dynamically assigning
it to memtables, cache, and working memory needed to handle
requests in flight.  Memory is statically partitioned across
cores, allowing us to exploit NUMA fully.  You can't do these
things in Java.

I would say the major contributors to Scylla performance are:
 - thread-per-core design
 - replacement of the page cache with a row cache
 - careful attention to many small details, each contributing
a little, but with a large overall impact

While I'm here I can say that 

Re: scylladb

2017-03-11 Thread Edward Capriolo
On Sat, Mar 11, 2017 at 2:08 PM, Bhuvan Rawal  wrote:

> "Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune."
>  - The details are indeed compelling to have a go ahead and test it for
> specific use case.
>
> If it works out good it can lead to good cost cut in infra costs as well
> as having to manage less servers plus probably less time to bootstrap &
> decommission nodes!
>
> It will also be interesting to have a benchmark with Cassandra 3 version
> as well, as the new storage engine is said to have better performance:
> https://www.datastax.com/2015/12/storage-engine-30
>
> Regards,
> Bhuvan
>
> On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity  wrote:
>
>> There is no magic 10X bullet.  It's a mix of multiple factors, which can
>> come up to less than 10X in some circumstances and more than 10X in others,
>> as has been reported on this thread by others.
>>
>> TPC doesn't give _any_ advantage when you have just one core, and can
>> give more than 10X on a machine with a large number of cores.  These are
>> becoming more and more common, think of the recent AMD Naples announcement;
>> with 32 cores per socket you can have 128 logical cores in a two-socket
>> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>>
>> You're welcome to browse our site to learn more about the architecture,
>> or watch this technical talk [1] I gave in QConSF that highlights some of
>> the techniques we use.
>>
>> Of course it's possible to mistune Cassandra to give bad results, that is
>> why we spent a lot more time tuning Cassandra and documenting everything
>> than we spent on Scylla.  You can read the report in [2], it is very
>> detailed, and provides a wealth of metrics like you'd expect.
>>
>> I'm not going to comment about the Aerospike numbers, I haven't studied
>> them in detail.  And no, you can't multiply results like that unless they
>> were done with very similar configurations and test harnesses.
>>
>> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
>> there's nothing to tune.
>>
>> Avi
>>
>> [1] https://www.infoq.com/presentations/scylladb
>> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-bench
>> mark-cluster-1/
>>
>>
>> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed 

Re: scylladb

2017-03-11 Thread Bhuvan Rawal
"Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
there's nothing to tune."
 - The details are indeed compelling to have a go ahead and test it for
specific use case.

If it works out good it can lead to good cost cut in infra costs as well as
having to manage less servers plus probably less time to bootstrap &
decommission nodes!

It will also be interesting to have a benchmark with Cassandra 3 version as
well, as the new storage engine is said to have better performance:
https://www.datastax.com/2015/12/storage-engine-30

Regards,
Bhuvan

On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity  wrote:

> There is no magic 10X bullet.  It's a mix of multiple factors, which can
> come up to less than 10X in some circumstances and more than 10X in others,
> as has been reported on this thread by others.
>
> TPC doesn't give _any_ advantage when you have just one core, and can give
> more than 10X on a machine with a large number of cores.  These are
> becoming more and more common, think of the recent AMD Naples announcement;
> with 32 cores per socket you can have 128 logical cores in a two-socket
> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>
> You're welcome to browse our site to learn more about the architecture, or
> watch this technical talk [1] I gave in QConSF that highlights some of the
> techniques we use.
>
> Of course it's possible to mistune Cassandra to give bad results, that is
> why we spent a lot more time tuning Cassandra and documenting everything
> than we spent on Scylla.  You can read the report in [2], it is very
> detailed, and provides a wealth of metrics like you'd expect.
>
> I'm not going to comment about the Aerospike numbers, I haven't studied
> them in detail.  And no, you can't multiply results like that unless they
> were done with very similar configurations and test harnesses.
>
> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune.
>
> Avi
>
> [1] https://www.infoq.com/presentations/scylladb
> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-
> benchmark-cluster-1/
>
>
> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>
> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity  wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which 

Re: scylladb

2017-03-11 Thread Avi Kivity
There is no magic 10X bullet.  It's a mix of multiple factors, which can 
come up to less than 10X in some circumstances and more than 10X in 
others, as has been reported on this thread by others.


TPC doesn't give _any_ advantage when you have just one core, and can 
give more than 10X on a machine with a large number of cores. These are 
becoming more and more common, think of the recent AMD Naples 
announcement; with 32 cores per socket you can have 128 logical cores in 
a two-socket server; or the AWS i3.16xlarge instance with 32 cores / 64 
vcpus.


You're welcome to browse our site to learn more about the architecture, 
or watch this technical talk [1] I gave in QConSF that highlights some 
of the techniques we use.


Of course it's possible to mistune Cassandra to give bad results, that 
is why we spent a lot more time tuning Cassandra and documenting 
everything than we spent on Scylla.  You can read the report in [2], it 
is very detailed, and provides a wealth of metrics like you'd expect.


I'm not going to comment about the Aerospike numbers, I haven't studied 
them in detail.  And no, you can't multiply results like that unless 
they were done with very similar configurations and test harnesses.


Lastly, why don't you test Scylla yourself?  It's pretty easy to set up, 
there's nothing to tune.


Avi

[1] https://www.infoq.com/presentations/scylladb
[2] 
http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/


On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
Agreed C++ gives an added advantage to talk to underlying hardware 
with better efficiency, it sound good but can a pice of code written 
in C++ give 1000% throughput than a Java app? Is TPC design 10X more 
performant than SEDA arch?


And if C/C++ is indeed that fast how can Aerospike (which is itself 
written in C) claim to be 10X faster than Scylla here 
http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining 
your's and aerospike's benchmarks it appears that Aerospike is 100X 
performant than C* - I highly doubt that!! )


For a moment lets forget about evaluating 2 different databases, one 
can observe 10X performance difference between a mistuned cassandra 
cluster and one thats tuned as per data model - there are so many 
Tunables in yaml as well as table configs.


Idea is - in order to strengthen your claim, you need to provide 
complete system metrics (Disk, CPU, Network), the OPS increase starts 
to decay along with the configs used. Having plain ops per second and 
99p latency is blackbox.


Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity > wrote:


ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly
responsible for a small fraction of the gain by executing faster
than Java.  But it is indirectly responsible for the gain by
allowing us direct control over memory and threading.  Just as an
example, Scylla starts by taking over almost all of the machine's
memory, and dynamically assigning it to memtables, cache, and
working memory needed to handle requests in flight.  Memory is
statically partitioned across cores, allowing us to exploit NUMA
fully.  You can't do these things in Java.

I would say the major contributors to Scylla performance are:
 - thread-per-core design
 - replacement of the page cache with a row cache
 - careful attention to many small details, each contributing a
little, but with a large overall impact

While I'm here I can say that performance is not the only goal
here, it is stable and predictable performance over varying loads
and during maintenance operations like repair, without any special
tuning.  We measure the amount of CPU and I/O spent on foreground
(user) and background (maintenance) tasks and divide them fairly.
This work is not complete but already makes operating Scylla a lot
simpler.


On 03/10/2017 01:42 AM, Kant Kodali wrote:

I dont think ScyllaDB performance is because of C++. The design
decisions in scylladb are indeed different from Cassandra such as
getting rid of SEDA and moving to TPC and so on.

If someone thinks it is because of C++ then just show the
benchmarks that proves it is indeed the C++ which gave 10X
performance boost as ScyllaDB claims instead of stating it.


On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III
> wrote:

They spend an enormous amount of time focusing on
performance. You can expect them to continue on with their
optimization and keep crushing it.

P.S., I don't work for ScyllaDB.

On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar
> wrote:

In all of their presentation they keep harping on the
fact that scylladb is written in C++ and 

Re: scylladb

2017-03-11 Thread benjamin roth
Thanks a lot for your detailed explanation!
I am very curious about the future development of Scylladb! Especially
about mvs and lwt!

Am 11.03.2017 02:05 schrieb "Dor Laor" :

> On Fri, Mar 10, 2017 at 4:45 PM, Kant Kodali  wrote:
>
>> http://performanceterracotta.blogspot.com/2012/09/numa-java.html
>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/per
>> formance-enhancements-7.html
>> http://openjdk.java.net/jeps/163
>>
>>
> Java can exploit NUMA but it's not as a efficient as can be done in c++.
> Andrea Arcangeli is the engineer behind Linux transparent huge pages(THP),
> he
> reported to me and the idea belongs to Avi. We did it for KVM's sake but
> it was designed to any long running process like Cassandra.
> However, the entire software stack should be aware. If you get a huge page
> (2MB)
> but keep in it only 1KB you waste lots of mem. On top of this, threads
> need to
> touch their data structures and they need to be well aligned, otherwise
> the memory
> page will bounce between the different cores.
> With Cassandra it gets more complicated since there is a heap and off-heap
> data.
>
> Do programmers really track their data alignment? I doubt it.
> Do users run C* with the JVM numa options and the right Linux THP options?
> Again, I doubt.
>
> Scylla on the other side is designed for NUMA. We have 2-level sharding.
> The inner shards are transparent
> to the user and are per-core (hyper thread). Such a shard access RAM only
> within its numa node. Memory
> is bonded to each thread/numa node. We have our own malloc allocator built
> for this scheme.
>
>
>
>> If scyllaDB has efficient Secondary indexes, LWT and MV's then that is
>> something. I would be glad to see how they perform.
>>
>>
> MV will be in 1.8, we haven't measured performance yet. We did measure our
> counter implementation
> and it looks promising (4X better throughput and 4X better latency on a
> 8-core machine).
> The not-written yet LWT will kick-a** since our fully async engine is
> ideal for the larger number
> of round trips the LWT needs.
>
> This is with the Linux tcp stack, once we'll use our dpdk one, performance
> will improve further ;)
>
>
>
>>
>> On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor  wrote:
>>
>>> Scylla isn't just about performance too.
>>>
>>> First, a disclaimer, I am a Scylla co-founder. I respect open source a
>>> lot,
>>> so you guys are welcome to shush me out of this thread. I only
>>> participate
>>> to provide value if I can (this is a thread about Scylla and our users
>>> are
>>> on our mailing list).
>>>
>>> Scylla is all about what Cassandra is plus:
>>>  - Efficient hardware utilization (scale-up, performance)
>>>  - Low tail latency
>>>  - Auto/dynamic tuning (no JVM tuning, we tune the OS ourselves, we have
>>> cpu scheduler,
>>>I/O userspace scheduler and more to come).
>>>  - SLA between compaction, repair, streaming and your r/w operations
>>>
>>> We started with a great foundation (C*) and wish to improve almost any
>>> aspect of it.
>>> Admittedly, we're way behind C* in terms of adoption. One need to start
>>> somewhere.
>>> However, users such as AppNexus run Scylla in production with 47
>>> physical nodes
>>> across 5 datacenters and their VP estimate that C* would have at least
>>> doubled the
>>> size. So this is equal for a 100-node C* cluster. Since we have the same
>>> gossip, murmur3 hash,
>>> CQL, nothing stops us to scale to 1,000 nodes. Another user (Mogujie)
>>> run 10s of TBs per node(!)
>>> in production.
>>>
>>> Also, since we try to compare Scylla and C* in a fair way, we invested a
>>> great deal of time
>>> to run C*. I can say it's not simple at all.
>>> Lastly, in a couple of months we'll reach parity in functionality with
>>> C* (counters are in 1.7 as experimental, in 1.8 counters will be stable and
>>> we'll have MV as experimental, LWT will be
>>> in the summer). We hope to collaborate with the C* community with the
>>> development of future
>>> features.
>>>
>>> Dor
>>>
>>>
>>> On Fri, Mar 10, 2017 at 10:19 AM, Jacques-Henri Berthemet <
>>> jacques-henri.berthe...@genesys.com> wrote:
>>>
 Cassandra is not about pure performance, there are many other DBs that
 are much faster than Cassandra. Cassandra strength is all about
 scalability, performance increases in a linear way as you add more nodes.
 During Cassandra summit 2014 Apple said they have a 10k node cluster. The
 usual limiting factor is your disk write speed and latency, I don’t see how
 C++ changes anything in this regard unless you can cache all your data in
 memory.



 I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster
 with PBs of data compared to Cassandra.

 *--*

 *Jacques-Henri Berthemet*



 *From:* Rakesh Kumar [mailto:rakeshkumar...@outlook.com]
 *Sent:* vendredi 10 mars 2017 09:58

 *To:*