[jira] [Created] (KUDU-3195) Make DMS flush policy more robust when maitenance threads are idle

Alexey Serbin (Jira) Thu, 17 Sep 2020 00:40:39 -0700

Alexey Serbin created KUDU-3195:
-----------------------------------

             Summary: Make DMS flush policy more robust when maitenance threads 
are idle
                 Key: KUDU-3195
                 URL: https://issues.apache.org/jira/browse/KUDU-3195
             Project: Kudu
          Issue Type: Improvement
          Components: tserver
    Affects Versions: 1.13.0
            Reporter: Alexey Serbin



In one scenario I observed very long bootstrap times of tablet servers 
(something between to 45 minutes and 60 minutes) even if tablet servers had 
relatively small amount of data under management (~80GByte).  It turned out the 
time was spent on replaying WAL segments, with {{kudu cluster ksck}} reporting 
something like below all the time during bootstrap:

{noformat}
  b0a20b117a1242ae9fc15620a6f7a524 (tserver-6.local.site:7050): not running
    State:       BOOTSTRAPPING
    Data state:  TABLET_DATA_READY
    Last status: Bootstrap replaying log segment 21/37 (2.28M/7.85M this 
segment, stats: ops{read=27374 overwritten=0 applied=25016 ignored=657} 
inserts{seen=5949247 
ignored=0} mutations{seen=0 ignored=0} orphaned_commits=7)
{noformat}

The workload I ran before shutting down the tablet servers consisted of many 
small UPSERT operations, but the cluster was idle after terminating the 
workload for long time (about few hours or so).  The workload was generated by
{noformat}
kudu perf loadgen \
  --table_name=$TABLE_NAME \
  --num_rows_per_thread=800000000 \
  --num_threads=4 \
  --use_upsert \
  --use_random_pk \
  $MASTER_ADDR
{noformat}

The table that the UPSERT workload was running against had been pre-populated 
by the following:
{noformat}
kudu perf loadgen --table_num_replicas=3 --keep-auto-table 
--table_num_hash_partitions=5 --table_num_range_partitions=5 
--num_rows_per_thread=800000000 --num_threads=4 $MASTER_ADDR
{noformat}

As it turned out, tablet servers accumulated huge number of DMS which required 
flushing/compaction, but after the memory pressure subsided, the compaction 
policy was scheduling just one  operation per tablet in every 120 seconds (the 
latter interval is controlled by {{\-\-flush_threshold_secs}}).  In fact, 
tablet servers could flush those rowsets non-stop since the maintenance threads 
were completely idle otherwise and there were no active workload running 
against the cluster.  Those DMS has been around for long time (much more than 
120 seconds) and were anchoring a lot of WAL segments.  So, the operations from 
the WAL had to be replayed once I restarted the tablet servers.

It would be great to update the flushing/compaction policy to allow tablet 
servers run {{FlushDeltaMemStoresOp}} as soon as a DMS becomes older than 
specified by {{\-\-flush_threshold_secs}} when the maintenance threads are not 
busy otherwise.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KUDU-3195) Make DMS flush policy more robust when maitenance threads are idle

Reply via email to