[
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454112#comment-15454112
]
Matt McCline commented on HIVE-14451:
-------------------------------------
There are 2 improvements in the patch.
First, when the input bytes being deserialized are immutable and it is safe to
retain references (e.g. hash table entry), the VectorDeserializeRow has an
alternate deserializeByRef method than can be called. This avoids an
unnecessary buffer copy operation.
Also, when BinarySortable and LazySimple have to "unescape" data in the input
buffer to produce the string/char/varchar/binary result, a preallocation scheme
is used where the (scratch) buffer in BytesColumnVector is made available to be
used directly as the target buffer. This avoids an extra buffer copy operation.
> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --------------------------------------------------------------------------
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
> Issue Type: Improvement
> Components: Vectorization
> Reporter: Gopal V
> Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to
> the byte[] are immutable.
> The hashmap result always allocates on boundary conditions, but never mutates
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be
> easy to know when the currentBytes is a borrowed slice from the original
> input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)