[
https://issues.apache.org/jira/browse/HIVE-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated HIVE-11665:
---------------------------
Description:
ORC String Dictionary Reader is slow due to the chunking of the input stream.
{code}
private void readDictionaryStream(InStream in) throws IOException {
if (in != null) { // Guard against empty dictionary stream.
if (in.available() > 0) {
dictionaryBuffer = new DynamicByteArray(64, in.available());
dictionaryBuffer.readAll(in);
// Since its start of strip invalidate the cache.
dictionaryBufferInBytesCache = null;
}
in.close();
} else {
dictionaryBuffer = null;
}
}
{code}
The fact that the data is chunked offers no advantage for the read-path where
there is no grow() operation for memory savings.
!orc-stringdict-reader.png!
was:
ORC String Dictionary Reader is slow due to the chunking of the input stream.
{code}
private void readDictionaryStream(InStream in) throws IOException {
if (in != null) { // Guard against empty dictionary stream.
if (in.available() > 0) {
dictionaryBuffer = new DynamicByteArray(64, in.available());
dictionaryBuffer.readAll(in);
// Since its start of strip invalidate the cache.
dictionaryBufferInBytesCache = null;
}
in.close();
} else {
dictionaryBuffer = null;
}
}
{code}
The fact that the data is chunked offers no advantage for the read-path where
there is no grow() operation for memory savings.
> ORC StringDictionaryReader should not used Chunked buffers
> ----------------------------------------------------------
>
> Key: HIVE-11665
> URL: https://issues.apache.org/jira/browse/HIVE-11665
> Project: Hive
> Issue Type: Improvement
> Components: File Formats
> Affects Versions: 1.3.0, 2.0.0
> Reporter: Gopal V
> Assignee: Prasanth Jayachandran
> Attachments: orc-stringdict-reader.png
>
>
> ORC String Dictionary Reader is slow due to the chunking of the input stream.
> {code}
> private void readDictionaryStream(InStream in) throws IOException {
> if (in != null) { // Guard against empty dictionary stream.
> if (in.available() > 0) {
> dictionaryBuffer = new DynamicByteArray(64, in.available());
> dictionaryBuffer.readAll(in);
> // Since its start of strip invalidate the cache.
> dictionaryBufferInBytesCache = null;
> }
> in.close();
> } else {
> dictionaryBuffer = null;
> }
> }
> {code}
> The fact that the data is chunked offers no advantage for the read-path where
> there is no grow() operation for memory savings.
> !orc-stringdict-reader.png!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)