[ 
https://issues.apache.org/jira/browse/ORC-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongqian reassigned ORC-2131:
-----------------------------


> Set default of orc.stripe.size.check.ratio and orc.dictionary.max.size.bytes 
> to 0 
> ----------------------------------------------------------------------------------
>
>                 Key: ORC-2131
>                 URL: https://issues.apache.org/jira/browse/ORC-2131
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: yongqian
>            Assignee: yongqian
>            Priority: Major
>
> Background
> After enabling the optimizations related to {{orc.stripe.size.check.ratio}} 
> and {{{}orc.dictionary.max.size.bytes{}}}, we observed that ORC files written 
> with the current defaults are about 10%–20% larger than before. For example, 
> datasets that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current 
> defaults, causing noticeable storage and I/O cost increase.
> Current defaults
>  * {{{}orc.dictionary.max.size.bytes{}}}: 16MB (16 * 1024 * 1024) — turns off 
> dictionary encoding when dictionary size exceeds this limit.
>  * {{{}orc.stripe.size.check.ratio{}}}: 2.0 — flushes a stripe when tree 
> writer size exceeds (ratio × orc.stripe.size).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to