This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/main by this push:
new 21c43267e ORC-2031:Document orc.dictionary.max.size.bytes and
orc.stripe.size.check.ratio
21c43267e is described below
commit 21c43267e9817eba9adc4b54c043417b7bfdcdec
Author: yongqian <[email protected]>
AuthorDate: Tue Oct 21 09:21:13 2025 -0700
ORC-2031:Document orc.dictionary.max.size.bytes and
orc.stripe.size.check.ratio
### What changes were proposed in this pull request?
Add documentation for two ORC configuration options to core-java-config.md:
- orc.dictionary.max.size.bytes (default: 16777216)
- orc.stripe.size.check.ratio (default: 2.0)
### Why are the changes needed?
These configuration options were defined in OrcConf.java but missing from
the official documentation. Users need official guidance on their purpose and
usage.
### How was this patch tested?
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #2450 from QianyongY/features/document_orc_conf_2031.
Authored-by: yongqian <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
site/_docs/core-java-config.md | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/site/_docs/core-java-config.md b/site/_docs/core-java-config.md
index 067852a47..42bbbd17f 100644
--- a/site/_docs/core-java-config.md
+++ b/site/_docs/core-java-config.md
@@ -165,6 +165,13 @@ permalink: /docs/core-java-config.html
If the number of distinct keys in a dictionary is greater than this
fraction of the total number of non-null rows, turn off dictionary encoding.
Use 1 to always use dictionary encoding.
</td>
</tr>
+<tr>
+ <td><code>orc.dictionary.max.size.bytes</code></td>
+ <td>16777216</td>
+ <td>
+ If the total size of the dictionary is greater than this, turn off
dictionary encoding. Use 0 to disable this check.
+ </td>
+</tr>
<tr>
<td><code>orc.dictionary.early.check</code></td>
<td>true</td>
@@ -284,6 +291,13 @@ permalink: /docs/core-java-config.html
How often should MemoryManager check the memory sizes? Measured in rows
added to all of the writers. Valid range is [1,10000] and is primarily meant
fortesting. Setting this too low may negatively affect performance. Use
orc.stripe.row.count instead if the value larger than orc.stripe.row.count.
</td>
</tr>
+<tr>
+ <td><code>orc.stripe.size.check.ratio</code></td>
+ <td>2.0</td>
+ <td>
+ Flush stripe if the tree writer size in bytes is larger than (this *
orc.stripe.size). Use 0 to disable this check.
+ </td>
+</tr>
<tr>
<td><code>orc.overwrite.output.file</code></td>
<td>false</td>