[
https://issues.apache.org/jira/browse/ORC-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502320#comment-16502320
]
Prasanth Jayachandran commented on ORC-373:
-------------------------------------------
It will disable dictionary encoding just before writing to stream. But
initially records will end up in RBTree, after one row group creation or stripe
flush, this RBTree will be flushed directly to DIRECT/DIRECT_V2 stream instead
of DICTIONARY stream.
This patch avoids buffering to RBTree from the beginning.
> If "orc.dictionary.key.threshold" is set to 0, don't try dictionary encoding.
> -----------------------------------------------------------------------------
>
> Key: ORC-373
> URL: https://issues.apache.org/jira/browse/ORC-373
> Project: ORC
> Issue Type: Bug
> Affects Versions: 1.5.2
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Priority: Major
> Fix For: 1.5.2, 1.6.0
>
>
> Currently dictionary check happens after creation of first row group entry.
> Even when row indexes are disabled, rows end up in red-black tree first
> before getting flushed during write stripe (into direct stream).
> If dictionary threshold is set to <= 0.0 disable dictionary, we should write
> directly to stream instead of RBTree. This is useful for hive streaming
> ingest where delta files explicitly disables dictionaries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)