[jira] [Commented] (KUDU-1625) Schedule compaction on rowsets with high percentage of deleted data

ASF subversion and git services (Jira) Fri, 20 Mar 2020 22:17:44 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063768#comment-17063768
 ]


ASF subversion and git services commented on KUDU-1625:
-------------------------------------------------------

Commit 705954872dc86238556456abed0a879bb1462e51 in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=7059548 ]

KUDU-1625: background op to GC ancient, fully deleted rowsets

This adds a background op that deletes disk rowsets that have had all of
their rows deleted. If the most recent update to a rowset is older than
the ancient history mark, and the rowset contains no live rows, that
rowset will be deleted.

It'd be nice if we could have the policy work for rowsets that are
mostly deleted, but such a solution would come with difficult questions
around write amplification and compatibility with the existing
compactions strategies. For instance, a more complete solution would
need to consider whether to rewrite a rowset if it had 25%, 50%, or 75%
deleted rows: some operators wouldn't mind the write amplification to
save space. However, picking a good heuristic (or exposing some knobs to
turn) makes this tricky.

The benefit of the approach in this patch is that no such tradeoff needs
to be made: the "write amplification" is minimal here because no new
data blocks are written in performing the operation -- the tablet
metadata is rewritten to exclude the blocks, and the underlying blocks
are deleted, which isn't IO intensive either.

There's still room for improvement in this implementation in that,
currently, a DMS flush will write stats to disk and we'll only read the
stats if we Init() the DeltaFileReader (e.g. on scan). I'll address this
in a follow-up patch.

Since the op GCs all viable rowsets in the tablet, a tablet should only
schedule one deleted rowset GC op at a time. This isn't necessary for
correctness, but avoids wasting some MM thread cycles.

I ran this on a real cluster, deleting large chunks of keyspace with 4
MM threads to confirm that space is actually freed, concurrent ops for
the same tablet aren't scheduled, and the op runs relatively quickly (in
the tens of ms, compared to hundreds to thousands of ms for other ops).

Change-Id: I696e2a29ea52ad4e54801b495c322bc371787124
Reviewed-on: http://gerrit.cloudera.org:8080/15145
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <[email protected]>


> Schedule compaction on rowsets with high percentage of deleted data
> -------------------------------------------------------------------
>
>                 Key: KUDU-1625
>                 URL: https://issues.apache.org/jira/browse/KUDU-1625
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tablet
>    Affects Versions: 1.0.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Although with KUDU-236 we can now remove rows that were deleted prior to the 
> ancient history mark, we don't actively schedule compactions based on deleted 
> rows. So, if for example we have a fully compacted table and issue a DELETE 
> for every row, the data size actually does not change, because no compactions 
> are triggered.
> We need some way to notice the fact that the ratio of deletes to rows is high 
> and decide to compact those rowsets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-1625) Schedule compaction on rowsets with high percentage of deleted data

Reply via email to