[jira] [Updated] (HIVE-11665) ORC StringDictionaryReader should not used Chunked buffers

Gopal V (JIRA) Wed, 26 Aug 2015 23:44:09 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Gopal V updated HIVE-11665:
---------------------------
    Description: 
ORC String Dictionary Reader is slow due to the chunking of the input stream.

{code}
 private void readDictionaryStream(InStream in) throws IOException {
      if (in != null) { // Guard against empty dictionary stream.
        if (in.available() > 0) {
          dictionaryBuffer = new DynamicByteArray(64, in.available());
          dictionaryBuffer.readAll(in);
          // Since its start of strip invalidate the cache.
          dictionaryBufferInBytesCache = null;
        }
        in.close();
      } else {
        dictionaryBuffer = null;
      }
    }
{code}

The fact that the data is chunked offers no advantage for the read-path where 
there is no grow() operation for memory savings.

!orc-stringdict-reader.png!

  was:
ORC String Dictionary Reader is slow due to the chunking of the input stream.

{code}
 private void readDictionaryStream(InStream in) throws IOException {
      if (in != null) { // Guard against empty dictionary stream.
        if (in.available() > 0) {
          dictionaryBuffer = new DynamicByteArray(64, in.available());
          dictionaryBuffer.readAll(in);
          // Since its start of strip invalidate the cache.
          dictionaryBufferInBytesCache = null;
        }
        in.close();
      } else {
        dictionaryBuffer = null;
      }
    }
{code}

The fact that the data is chunked offers no advantage for the read-path where 
there is no grow() operation for memory savings.


> ORC StringDictionaryReader should not used Chunked buffers
> ----------------------------------------------------------
>
>                 Key: HIVE-11665
>                 URL: https://issues.apache.org/jira/browse/HIVE-11665
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Gopal V
>            Assignee: Prasanth Jayachandran
>         Attachments: orc-stringdict-reader.png
>
>
> ORC String Dictionary Reader is slow due to the chunking of the input stream.
> {code}
>  private void readDictionaryStream(InStream in) throws IOException {
>       if (in != null) { // Guard against empty dictionary stream.
>         if (in.available() > 0) {
>           dictionaryBuffer = new DynamicByteArray(64, in.available());
>           dictionaryBuffer.readAll(in);
>           // Since its start of strip invalidate the cache.
>           dictionaryBufferInBytesCache = null;
>         }
>         in.close();
>       } else {
>         dictionaryBuffer = null;
>       }
>     }
> {code}
> The fact that the data is chunked offers no advantage for the read-path where 
> there is no grow() operation for memory savings.
> !orc-stringdict-reader.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11665) ORC StringDictionaryReader should not used Chunked buffers

Reply via email to