[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454112#comment-15454112
 ] 

Matt McCline edited comment on HIVE-14451 at 9/1/16 3:00 AM:
-------------------------------------------------------------

There are 2 improvements in the patch.

First, when the input bytes being deserialized are immutable and it is safe to 
retain references (e.g. hash table entry), the VectorDeserializeRow has an 
alternate deserializeByRef method is called.  This avoids an unnecessary buffer 
copy operation.  Native Vector MapJoin for small table data (LazyBinary).

Also, when BinarySortable and LazySimple have to "unescape" data in the input 
buffer to produce the string/char/varchar/binary result, a preallocation scheme 
is used where the (scratch) buffer in BytesColumnVector is made available to be 
used directly as the target buffer.  This avoids an extra buffer copy 
operation.  Vectorizing text files (LazySimple).  Deserializing for Vectorized 
Reduce (BinarySortable/LazyBinary).  Also, I suspect there is a bug in the way 
string/char/varchar/binary are handled today for BinarySortable that *always* 
caused an extra copy...


was (Author: mmccline):
There are 2 improvements in the patch.

First, when the input bytes being deserialized are immutable and it is safe to 
retain references (e.g. hash table entry), the VectorDeserializeRow has an 
alternate deserializeByRef method than can be called.  This avoids an 
unnecessary buffer copy operation.

Also, when BinarySortable and LazySimple have to "unescape" data in the input 
buffer to produce the string/char/varchar/binary result, a preallocation scheme 
is used where the (scratch) buffer in BytesColumnVector is made available to be 
used directly as the target buffer.  This avoids an extra buffer copy operation.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14451
>                 URL: https://issues.apache.org/jira/browse/HIVE-14451
>             Project: Hive
>          Issue Type: Improvement
>          Components: Vectorization
>            Reporter: Gopal V
>            Assignee: Matt McCline
>         Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to