This is an automated email from the ASF dual-hosted git repository. alexey pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
commit 37f780f00a5a5fdb754ac952896f6c3368d4256d Author: Alexey Serbin <[email protected]> AuthorDate: Thu Jan 30 12:36:01 2020 -0800 [docs] note about space reclamation for deleted rows Added a note about not reclaiming the space after deleting rows from a table, urging for a proper schema design for larger fact tables. Change-Id: Ib49e86ed80af0325e3ceaceb9964749534755be4 Reviewed-on: http://gerrit.cloudera.org:8080/15138 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins --- docs/schema_design.adoc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/schema_design.adoc b/docs/schema_design.adoc index 3ecceed..9b05991 100644 --- a/docs/schema_design.adoc +++ b/docs/schema_design.adoc @@ -570,3 +570,14 @@ Non-alterable Column Types:: Kudu does not allow the type of a column to be altered. Partition Splitting:: Partitions cannot be split or merged after table creation. + +Deleted row disk space is not reclaimed:: The disk space occupied by a deleted +row is only reclaimable via compaction, and only when the deletion's age +exceeds the "tablet history maximum age" (controlled by the +`--tablet_history_max_age_sec` flag). Furthermore, Kudu currently only schedules +compactions in order to improve read/write performance; a tablet will never be +compacted purely to reclaim disk space. As such, range partitioning should be +used when it is expected that large swaths of rows will be discarded. With range +partitioning, individual partitions may be dropped to discard data and reclaim +disk space. See link:https://issues.apache.org/jira/browse/KUDU-1625[KUDU-1625] +for details.
