I did some more benchmarking yesterday and today and got the following
results:
# 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only)
local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115]
raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327]
Min slowdown: 3.9200241327 %
Max slowdown: 10.0560772192 %
Average slowdown: 7.66171972498 %
So it looks like a 4-10% write slowdown on tables with replication disabled
if we remove LocalConsensus.
FWIW, this is a pure insert workload only. When comparing performance on
YCSB runs with a mixed read / write workload there is essentially no
difference.
Worth mentioning the settings used. Same hardware as before, with the
following flags:
ycsb_opts:
recordcount: 4000000
operationcount: 1000000
threads: 16
max_execution_time: 1800
load_sync: true
ts_flags:
cfile_do_on_finish: "flush"
flush_threshold_mb: "1000"
maintenance_manager_num_threads: "2"
(I also tuned election timeouts to be near zero to make leader election
instantaneous)
I did a quick test of migrating from a version of Kudu with support for
LocalConsensus and a version without support for it and it worked.
What do you guys think? Is this too large of a hit to take to remove our
"fake" version of Consensus?
As mentioned previously, the drawback to keeping LocalConsensus is that
there is currently no way to add nodes to a system running with it. It's
currently the default choice for people who set replication factor to 1 on
a table.
Mike
On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <[email protected]> wrote:
> To spare you the wall of text let me quickly summarize the scale factor 10
> results:
>
> Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown) but
> there's quite a bit of variance in there
> Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup) but
> again, there's a bit of variance
>
> Doesn't really look any different to me.
>
> Mike
>
> On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <[email protected]> wrote:
>
>> I still have to test migration (pretty sure it's a no-op though).
>> However, I got all tests passing with LocalConsensus disabled in TabletPeer.
>>
>> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
>> using the tpch.sh default settings (except for scale factor).
>> The summary is that the perf looks pretty similar between the two
>> Consensus implementations. I don't really see a major difference.
>>
>> Machine specs:
>>
>> CPU(s): 48 (4x6 core w/ HT)
>> RAM: 96 GB
>> OS: Centos 6.6 (final)
>> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC
>> 2015 x86_64 x86_64 GNU/Linux
>>
>> The numbers:
>>
>> *INSERT*
>>
>> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
>> averages
>> local 1 26.557 26.557 -
>> raft 1 25.843 25.843 - 1.027628371
>> local 10 271.410
>> local 10 282.738
>> local 10 283.580 279.243 6.79634029
>> raft 10 281.986
>> raft 10 281.551
>> raft 10 283.049 282.195 0.7706272337 0.9895367984
>>
>> *QUERY*
>>
>> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
>> averages
>> local 1 1.281
>> local 1 1.325
>> local 1 1.340
>> local 1 1.280 1.31 0.03
>> raft 1 1.304
>> raft 1 1.334
>> raft 1 1.293
>> raft 1 1.331 1.32 0.02 0.9931584949
>> local 10 14.879
>> local 10 14.333
>> local 10 14.397
>> local 10 14.040
>> local 10 13.573
>> local 10 13.216
>> local 10 13.597
>> local 10 13.858 13.99 0.54
>> raft 10 12.455
>> raft 10 13.998
>> raft 10 13.367
>> raft 10 13.759
>> raft 10 14.301
>> raft 10 13.919
>> raft 10 13.036
>> raft 10 13.410 13.53 0.59 1.033701326
>>
>> Is there some other measurement I should take or does this seem
>> sufficient from a performance perspective?
>>
>> Thanks,
>> Mike
>>
>>
>>
>> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <[email protected]> wrote:
>>
>>> I don't think we want to take much of a perf hit. I'll check it out.
>>>
>>> Another reason to have one version of Consensus is that it's currently
>>> not possible to go from 1 node to 3.
>>>
>>> MIke
>>>
>>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <[email protected]> wrote:
>>>
>>>> I'm curious also what kind of perf impact we are willing to take for the
>>>> un-replicated case. I think single-node Kudu performing well is actually
>>>> nice from an adoption standpoint (many people have workloads which fit
>>>> on a
>>>> single machine). Would be good to have some simple verification that the
>>>> write perf of single-node raft isn't substantially worse.
>>>>
>>>> -Todd
>>>>
>>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <[email protected]> wrote:
>>>>
>>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <[email protected]>
>>>> > wrote:
>>>> >
>>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
>>>> we're
>>>> > > not totally sure it works in all cases.
>>>> > >
>>>> >
>>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
>>>> doesn't
>>>> > work, though.
>>>> >
>>>> >
>>>> > > Regarding the tests, I guess just flip it and see what happens on
>>>> ctest?
>>>> > >
>>>> >
>>>> > Yeah, it fails of course but mostly for silly reasons related to test
>>>> > setup. Working on that.
>>>> >
>>>> >
>>>> > > Regarding the upgrade path, I think we'd need to test this at some
>>>> scale,
>>>> > > i.e. fill up a cluster using the current version, with local
>>>> consensus,
>>>> > and
>>>> > > then replace the binaries with the new version, without it.
>>>> > >
>>>> >
>>>> > +1 SGTM. I don't mind doing that.
>>>> >
>>>> > Mike
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>
>