yongqian created ORC-2131:
-----------------------------

             Summary: Set default of orc.stripe.size.check.ratio and 
orc.dictionary.max.size.bytes to 0 
                 Key: ORC-2131
                 URL: https://issues.apache.org/jira/browse/ORC-2131
             Project: ORC
          Issue Type: Improvement
            Reporter: yongqian
            Assignee: yongqian


Background

After enabling the optimizations related to {{orc.stripe.size.check.ratio}} and 
{{{}orc.dictionary.max.size.bytes{}}}, we observed that ORC files written with 
the current defaults are about 10%–20% larger than before. For example, 
datasets that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current 
defaults, causing noticeable storage and I/O cost increase.

Current defaults
 * {{{}orc.dictionary.max.size.bytes{}}}: 16MB (16 * 1024 * 1024) — turns off 
dictionary encoding when dictionary size exceeds this limit.
 * {{{}orc.stripe.size.check.ratio{}}}: 2.0 — flushes a stripe when tree writer 
size exceeds (ratio × orc.stripe.size).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to