[
https://issues.apache.org/jira/browse/KUDU-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213532#comment-17213532
]
ASF subversion and git services commented on KUDU-3195:
-------------------------------------------------------
Commit 640a84ecff857c3d0447c690c68e2361eb3e9c3b in kudu's branch
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=640a84e ]
KUDU-3195: flush when any DMS in the tablet is older than the time threshold
Currently each tablet will wait at least 2 minutes (controlled by
--flush_threshold_secs) between flushing DMSs, even if there are several
DMSs that are older than 2 minutes in a given tablet. This means that
for tablets with several dozen rowsets and updates across the entire
tablet, it could take hours to flush all the deltas.
Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.
Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Reviewed-on: http://gerrit.cloudera.org:8080/16581
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <[email protected]>
> Make DMS flush policy more robust when maintenance threads are idle
> -------------------------------------------------------------------
>
> Key: KUDU-3195
> URL: https://issues.apache.org/jira/browse/KUDU-3195
> Project: Kudu
> Issue Type: Improvement
> Components: tserver
> Affects Versions: 1.13.0
> Reporter: Alexey Serbin
> Priority: Major
>
> In one scenario I observed very long bootstrap times of tablet servers
> (something between to 45 minutes and 60 minutes) even if tablet servers had
> relatively small amount of data under management (~80GByte). It turned out
> the time was spent on replaying WAL segments, with {{kudu cluster ksck}}
> reporting something like below all the time during bootstrap:
> {noformat}
> b0a20b117a1242ae9fc15620a6f7a524 (tserver-6.local.site:7050): not running
> State: BOOTSTRAPPING
> Data state: TABLET_DATA_READY
> Last status: Bootstrap replaying log segment 21/37 (2.28M/7.85M this
> segment, stats: ops{read=27374 overwritten=0 applied=25016 ignored=657}
> inserts{seen=5949247
> ignored=0} mutations{seen=0 ignored=0} orphaned_commits=7)
> {noformat}
> The workload I ran before shutting down the tablet servers consisted of many
> small UPSERT operations, but the cluster was idle after terminating the
> workload for long time (about few hours or so). The workload was generated by
> {noformat}
> kudu perf loadgen \
> --table_name=$TABLE_NAME \
> --num_rows_per_thread=800000000 \
> --num_threads=4 \
> --use_upsert \
> --use_random_pk \
> $MASTER_ADDR
> {noformat}
> The table that the UPSERT workload was running against had been pre-populated
> by the following:
> {noformat}
> kudu perf loadgen --table_num_replicas=3 --keep-auto-table
> --table_num_hash_partitions=5 --table_num_range_partitions=5
> --num_rows_per_thread=800000000 --num_threads=4 $MASTER_ADDR
> {noformat}
> As it turned out, tablet servers accumulated huge number of DMS which
> required flushing/compaction, but after the memory pressure subsided, the
> compaction policy was scheduling just one operation per tablet in every 120
> seconds (the latter interval is controlled by {{\-\-flush_threshold_secs}}).
> In fact, tablet servers could flush those rowsets non-stop since the
> maintenance threads were completely idle otherwise and there were no active
> workload running against the cluster. Those DMS has been around for long
> time (much more than 120 seconds) and were anchoring a lot of WAL segments.
> So, the operations from the WAL had to be replayed once I restarted the
> tablet servers.
> It would be great to update the flushing/compaction policy to allow tablet
> servers run {{FlushDeltaMemStoresOp}} as soon as a DMS becomes older than
> specified by {{\-\-flush_threshold_secs}} when the maintenance threads are
> not busy otherwise.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)