This is an automated email from the ASF dual-hosted git repository.
william pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 43fa5224e MINOR: Add sizes orc tools doc
43fa5224e is described below
commit 43fa5224ec891ad4c2f5036a052fb094a41568bf
Author: deshanxiao <[email protected]>
AuthorDate: Tue Sep 27 16:21:09 2022 -0700
MINOR: Add sizes orc tools doc
### What changes were proposed in this pull request?
Doc changes: Add the docs of sub-command `sizes`
### Why are the changes needed?
When the sub-command `sizes` was introduced to the orc tools. It has no
document. We need to add it.
### How was this patch tested?
Existing UT
Closes #1265 from deshanxiao/add-doc.
Authored-by: deshanxiao <[email protected]>
Signed-off-by: William Hyun <[email protected]>
---
site/_docs/java-tools.md | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/site/_docs/java-tools.md b/site/_docs/java-tools.md
index 8630b33f1..7b1069ea1 100644
--- a/site/_docs/java-tools.md
+++ b/site/_docs/java-tools.md
@@ -18,6 +18,7 @@ The subcommands for the tools are:
* key (since ORC 1.5) - print information about the encryption keys
* meta - print the metadata of an ORC file
* scan (since ORC 1.3) - scan the data for benchmarking
+ * sizes (since ORC 1.7.2) - list size on disk of each column
* version (since ORC 1.6) - print the version of this ORC tool
The command line looks like:
@@ -313,6 +314,23 @@ cost of printing the data out.
`-v,--verbose`
: Print exceptions
+## Java Sizes
+
+The sizes command lists size on disk of each column. The output contains not
+only the raw data of the table, but also the size of metadata such as
`padding`,
+`stripeFooter`, `fileFooter`, `stripeIndex` and `stripeData`.
+
+~~~ shell
+% java -jar orc-tools-X.Y.Z-uber.jar sizes examples/my-file.orc
+Percent Bytes/Row Name
+ 98.45 2.62 y
+ 0.81 0.02 _file_footer
+ 0.30 0.01 _index
+ 0.25 0.01 x
+ 0.19 0.01 _stripe_footer
+______________________________________________________________________
+~~~
+
## Java Version
The version command prints the version of this ORC tool.