This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 0184a6605 ORC-1563: Fix `orc.bloom.filter.fpp` default value and
`orc.compress` notes of Spark and Hive config docs
0184a6605 is described below
commit 0184a6605975d263d966b3a8b513ed5c2a9cbbd7
Author: sychen <[email protected]>
AuthorDate: Wed Dec 27 11:17:01 2023 -0800
ORC-1563: Fix `orc.bloom.filter.fpp` default value and `orc.compress` notes
of Spark and Hive config docs
### What changes were proposed in this pull request?
1. Add `orc.compress` enumeration value description
- LZO, LZ4 (ORC-77) since ORC 1.2
- ZSTD (ORC-363) since ORC 1.6
2. Fix `orc.bloom.filter.fpp` default value (ORC-1338) since ORC 1.8.2
### Why are the changes needed?
The document is out of date.
### How was this patch tested?
local check
Closes #1709 from cxzl25/ORC-1563.
Authored-by: sychen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
site/_docs/hive-config.md | 4 ++--
site/_docs/spark-config.md | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/site/_docs/hive-config.md b/site/_docs/hive-config.md
index 99e4863ea..29dc29dfb 100644
--- a/site/_docs/hive-config.md
+++ b/site/_docs/hive-config.md
@@ -12,13 +12,13 @@ with the same options.
Key | Default | Notes
:----------------------- | :---------- | :------------------------
-orc.compress | ZLIB | high level compression = {NONE, ZLIB,
SNAPPY}
+orc.compress | ZLIB | high level compression = {NONE, ZLIB,
SNAPPY, LZO, LZ4, ZSTD}
orc.compress.size | 262,144 | compression chunk size
orc.stripe.size | 67,108,864 | memory buffer in bytes for writing
orc.row.index.stride | 10,000 | number of rows between index entries
orc.create.index | true | whether the ORC writer create indexes
as part of the file or not
orc.bloom.filter.columns | "" | comma separated list of column names
-orc.bloom.filter.fpp | 0.05 | bloom filter false positive rate
+orc.bloom.filter.fpp | 0.01 | bloom filter false positive rate
For example, to create an ORC table without high level compression:
diff --git a/site/_docs/spark-config.md b/site/_docs/spark-config.md
index dca4124c2..b8fbb6db0 100644
--- a/site/_docs/spark-config.md
+++ b/site/_docs/spark-config.md
@@ -12,13 +12,13 @@ with the same options.
Key | Default | Notes
:----------------------- | :---------- | :------------------------
-orc.compress | ZLIB | high level compression = {NONE, ZLIB,
SNAPPY, ZSTD}
+orc.compress | ZLIB | high level compression = {NONE, ZLIB,
SNAPPY, LZO, LZ4, ZSTD}
orc.compress.size | 262,144 | compression chunk size
orc.stripe.size | 67,108,864 | memory buffer in bytes for writing
orc.row.index.stride | 10,000 | number of rows between index entries
orc.create.index | true | whether the ORC writer create indexes
as part of the file or not
orc.bloom.filter.columns | "" | comma separated list of column names
-orc.bloom.filter.fpp | 0.05 | bloom filter false positive rate
+orc.bloom.filter.fpp | 0.01 | bloom filter false positive rate
orc.key.provider | "hadoop" | key provider
orc.encrypt | "" | list of keys and columns to encrypt
with
orc.mask | "" | masks to apply to the encrypted
columns