Repository: kudu
Updated Branches:
  refs/heads/master a78171026 -> 12ae13b03


[docs] Add docs about disk usage due to sparse files

A few times users have been confused about the amount of space Kudu
is using with the log block manager because Kudu uses sparse files.
This adds a quick bit of docs explaining the source of this
discrepancy and showing how to get accurate numbers.

Change-Id: I4e73d7d5f2edc8a2676f3207e06d29ec89f7e1a0
Reviewed-on: http://gerrit.cloudera.org:8080/9817
Tested-by: Kudu Jenkins
Reviewed-by: Attila Bukor <abu...@cloudera.com>
Reviewed-by: Adar Dembo <a...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/8728dfc6
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/8728dfc6
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/8728dfc6

Branch: refs/heads/master
Commit: 8728dfc680d60f3482938f8c2876cc53301aab58
Parents: a781710
Author: Will Berkeley <wdberke...@apache.org>
Authored: Mon Mar 26 22:35:44 2018 -0700
Committer: Will Berkeley <wdberke...@gmail.com>
Committed: Tue Mar 27 17:18:01 2018 +0000

----------------------------------------------------------------------
 docs/troubleshooting.adoc | 48 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/8728dfc6/docs/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/docs/troubleshooting.adoc b/docs/troubleshooting.adoc
index 11f166f..73cd530 100644
--- a/docs/troubleshooting.adoc
+++ b/docs/troubleshooting.adoc
@@ -256,6 +256,54 @@ TIP: NTP requires a network connection and may take a few 
minutes to synchronize
 at startup. In some cases a spotty network connection may make NTP report the 
clock as unsynchronized.
 A common, though temporary, workaround for this is to restart NTP with one of 
the commands above.
 
+[[disk_space_usage]]
+== Disk Space Usage
+
+When using the log block manager (the default on Linux), Kudu uses
+link:https://en.wikipedia.org/wiki/Sparse_file[sparse files] to store data. A
+sparse file has a different apparent size than the actual amount of disk space
+it uses. This means that some tools may inaccurately report the disk space
+used by Kudu. For example, the size listed by `ls -l` does not accurately
+reflect the disk space used by Kudu data files:
+
+[source,bash]
+----
+$ ls -lh /data/kudu/tserver/data
+total 117M
+-rw------- 1 kudu kudu 160M Mar 26 19:37 0b9807b8b17d48a6a7d5b16bf4ac4e6d.data
+-rw------- 1 kudu kudu 4.4K Mar 26 19:37 
0b9807b8b17d48a6a7d5b16bf4ac4e6d.metadata
+-rw------- 1 kudu kudu  32M Mar 26 19:37 2f26eeacc7e04b65a009e2c9a2a8bd20.data
+-rw------- 1 kudu kudu 4.3K Mar 26 19:37 
2f26eeacc7e04b65a009e2c9a2a8bd20.metadata
+-rw------- 1 kudu kudu 672M Mar 26 19:37 30a2dd2cd3554d8a9613f588a8d136ff.data
+-rw------- 1 kudu kudu 4.4K Mar 26 19:37 
30a2dd2cd3554d8a9613f588a8d136ff.metadata
+-rw------- 1 kudu kudu  32M Mar 26 19:37 7434c83c5ec74ae6af5974e4909cbf82.data
+-rw------- 1 kudu kudu 4.3K Mar 26 19:37 
7434c83c5ec74ae6af5974e4909cbf82.metadata
+-rw------- 1 kudu kudu 672M Mar 26 19:37 772d070347a04f9f8ad2ad3241440090.data
+-rw------- 1 kudu kudu 4.4K Mar 26 19:37 
772d070347a04f9f8ad2ad3241440090.metadata
+-rw------- 1 kudu kudu 160M Mar 26 19:37 86e50a95531f46b6a79e671e6f5f4151.data
+-rw------- 1 kudu kudu 4.4K Mar 26 19:37 
86e50a95531f46b6a79e671e6f5f4151.metadata
+-rw------- 1 kudu kudu  687 Mar 26 19:26 block_manager_instance
+----
+
+Notice that the total size reported is 117MiB, while the first file's size is
+listed as 160MiB. Adding the `-s` option to `ls` will cause `ls` to output the
+file's disk space usage.
+
+The `du` and `df` utilities report the actual disk space usage by default.
+
+[source,bash]
+----
+$ du -h /data/kudu/tserver/data
+118M   /data/kudu/tserver/data
+----
+
+The apparent size can be shown with the `--apparent-size` flag to `du`.
+
+[source,bash]
+----
+$ du -h --apparent-size /data/kudu/tserver/data
+1.7G  /data/kudu/tserver/data
+----
 
 [[crash_reporting]]
 == Reporting Kudu Crashes

Reply via email to