[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

ASF GitHub Bot (Jira) Tue, 22 Jun 2021 15:39:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25190?focusedWorklogId=613733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613733
 ]


ASF GitHub Bot logged work on HIVE-25190:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jun/21 22:38
            Start Date: 22/Jun/21 22:38
    Worklog Time Spent: 10m 
      Work Description: omalley commented on a change in pull request #2408:
URL: https://github.com/apache/hive/pull/2408#discussion_r656627997



##########
File path: 
storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
##########
@@ -121,30 +125,30 @@ public void setRef(int elementNum, byte[] sourceBuf, int 
start, int length) {
    * Provide the estimated number of bytes needed to hold
    * a full column vector worth of byte string data.
    *
-   * @param estimatedValueSize  Estimated size of buffer space needed
+   * @param estimatedValueSize  Estimated size of buffer space needed per row
    */
   public void initBuffer(int estimatedValueSize) {
-    nextFree = 0;
     smallBufferNextFree = 0;
 
     // if buffer is already allocated, keep using it, don't re-allocate
-    if (buffer != null) {
+    if (smallBuffer != null) {
       // Free up any previously allocated buffers that are referenced by vector
       if (bufferAllocationCount > 0) {
         for (int idx = 0; idx < vector.length; ++idx) {
           vector[idx] = null;
           length[idx] = 0;
         }
-        buffer = smallBuffer; // In case last row was a large bytes value
       }
     } else {
       // allocate a little extra space to limit need to re-allocate
-      int bufferSize = this.vector.length * (int)(estimatedValueSize * 
EXTRA_SPACE_FACTOR);
+      long bufferSize = (long) (this.vector.length * estimatedValueSize * 
EXTRA_SPACE_FACTOR);
       if (bufferSize < DEFAULT_BUFFER_SIZE) {
         bufferSize = DEFAULT_BUFFER_SIZE;
       }
-      buffer = new byte[bufferSize];
-      smallBuffer = buffer;
+      if (bufferSize > MAX_SIZE_FOR_SMALL_BUFFER) {

Review comment:
       In initBuffer, we are using an estimate of the value size to get the 
initial size. If a given value is larger than MAX_SIZE_FOR_SMALL_ITEM it will 
be put into a temporary buffer. This check makes sure that we don't use a large 
estimate to allocate a huge backing buffer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 613733)
    Time Spent: 1h 10m  (was: 1h)

> BytesColumnVector fails when the aggregate size is > 1gb
> --------------------------------------------------------
>
>                 Key: HIVE-25190
>                 URL: https://issues.apache.org/jira/browse/HIVE-25190
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
>                 + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

Reply via email to