[
https://issues.apache.org/jira/browse/HIVE-25190?focusedWorklogId=613756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613756
]
ASF GitHub Bot logged work on HIVE-25190:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Jun/21 23:38
Start Date: 22/Jun/21 23:38
Worklog Time Spent: 10m
Work Description: pavibhai commented on a change in pull request #2408:
URL: https://github.com/apache/hive/pull/2408#discussion_r656651732
##########
File path:
storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java
##########
@@ -258,73 +262,56 @@ public void setValPreallocated(int elementNum, int
length) {
public void setConcat(int elementNum, byte[] leftSourceBuf, int leftStart,
int leftLen,
byte[] rightSourceBuf, int rightStart, int rightLen) {
int newLen = leftLen + rightLen;
- if ((nextFree + newLen) > buffer.length) {
- increaseBufferSpace(newLen);
- }
- vector[elementNum] = buffer;
- this.start[elementNum] = nextFree;
+ ensureValPreallocated(newLen);
+ vector[elementNum] = currentValue;
+ this.start[elementNum] = currentOffset;
this.length[elementNum] = newLen;
- System.arraycopy(leftSourceBuf, leftStart, buffer, nextFree, leftLen);
- nextFree += leftLen;
- System.arraycopy(rightSourceBuf, rightStart, buffer, nextFree, rightLen);
- nextFree += rightLen;
+ System.arraycopy(leftSourceBuf, leftStart, currentValue, currentOffset,
leftLen);
+ System.arraycopy(rightSourceBuf, rightStart, currentValue,
+ currentOffset + leftLen, rightLen);
}
/**
- * Increase buffer space enough to accommodate next element.
+ * Allocate/reuse enough buffer space to accommodate next element.
+ * Updates the nextFree field to point to the start of the new record.
+ * If smallBuffer is used, smallBufferNextFree is updated.
+ *
* This uses an exponential increase mechanism to rapidly
* increase buffer size to enough to hold all data.
* As batches get re-loaded, buffer space allocated will quickly
* stabilize.
*
* @param nextElemLength size of next element to be added
+ * @return the buffer to use for the next element
*/
- public void increaseBufferSpace(int nextElemLength) {
- // A call to increaseBufferSpace() or ensureValPreallocated() will ensure
that buffer[] points to
+ private byte[] allocateBuffer(int nextElemLength) {
+ // A call to ensureValPreallocated() will ensure that buffer[] points to
// a byte[] with sufficient space for the specified size.
- // This will either point to smallBuffer, or to a newly allocated byte
array for larger values.
- if (nextElemLength > MAX_SIZE_FOR_SMALL_BUFFER) {
- // Larger allocations will be special-cased and will not use the normal
buffer.
- // buffer/nextFree will be set to a newly allocated array just for the
current row.
- // The next row will require another call to increaseBufferSpace() since
this new buffer should be used up.
- byte[] newBuffer = new byte[nextElemLength];
+ // If this is a large value or small buffer is maxed out, allocate a
Review comment:
Should we allocate another shared buffer or should we go with a single
use buffer?
If the incidence is low then it helps to stay with single use to keep the
code simple or we go with the pattern that if `length >
MAX_SIZE_FOR_SMALL_ITEM` then single use otherwise if `offset + length >
MAX_SIZE_FOR_SHARED_BUFFER` allocate a new shared buffer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 613756)
Time Spent: 1.5h (was: 1h 20m)
> BytesColumnVector fails when the aggregate size is > 1gb
> --------------------------------------------------------
>
> Key: HIVE-25190
> URL: https://issues.apache.org/jira/browse/HIVE-25190
> Project: Hive
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb),
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
> + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)