[
https://issues.apache.org/jira/browse/HIVE-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prasanth Jayachandran updated HIVE-11665:
-----------------------------------------
Summary: ORC StringDictionaryReader should not use Chunked buffers (was:
ORC StringDictionaryReader should not used Chunked buffers)
> ORC StringDictionaryReader should not use Chunked buffers
> ---------------------------------------------------------
>
> Key: HIVE-11665
> URL: https://issues.apache.org/jira/browse/HIVE-11665
> Project: Hive
> Issue Type: Improvement
> Components: File Formats
> Affects Versions: 1.3.0, 2.0.0
> Reporter: Gopal V
> Assignee: Prasanth Jayachandran
> Attachments: orc-stringdict-reader.png
>
>
> ORC String Dictionary Reader is slow due to the chunking of the input stream.
> {code:title=ql/src/java/org/apache/hadoop/hive/ql/io/orc/TreeReaderFactory.java#L1678}
> private void readDictionaryStream(InStream in) throws IOException {
> if (in != null) { // Guard against empty dictionary stream.
> if (in.available() > 0) {
> dictionaryBuffer = new DynamicByteArray(64, in.available());
> dictionaryBuffer.readAll(in);
> // Since its start of strip invalidate the cache.
> dictionaryBufferInBytesCache = null;
> }
> in.close();
> } else {
> dictionaryBuffer = null;
> }
> }
> {code}
> The fact that the data is chunked offers no advantage for the read-path where
> there is no grow() operation for memory savings.
> !orc-stringdict-reader.png!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)