[
https://issues.apache.org/jira/browse/HIVE-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838592#comment-15838592
]
Matt McCline commented on HIVE-15700:
-------------------------------------
Ok, I’ve been thinking about this.
First, I think the old code’s idea of doubling the buffer and copying old data
isn’t great. We are copying old data that doesn’t need to be copied – the old
byte[][] vector is still referencing that old buffer.
The memory usage of BytesColumnVector is cyclical. Rather than doubling the
buffer I think we should just allocate another buffer with the old size and
start at the beginning of it. So for this cycle we’d have more than one
buffer. If the new request is so large the old size is inadequate, then of
course take the max of the old buffer size and the new request size. In
effect, very large requests would just get their own buffer.
Add more bookkeeping the notice the largest amount of data used in a long
member for the cycle.
When we reset the BytesColumnVector, we can then decide what do with the
current buffer. Perhaps the old buffer needs to be let go and a new larger
buffer allocated. But if we have been allocating a bunch of large buffers for
large requests, perhaps we let the old buffer be.
> BytesColumnVector can get stuck trying to resize byte buffer
> ------------------------------------------------------------
>
> Key: HIVE-15700
> URL: https://issues.apache.org/jira/browse/HIVE-15700
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Reporter: Jason Dere
> Assignee: Jason Dere
> Attachments: HIVE-15700.1.patch
>
>
> While looking at HIVE-15698, hit an issue where one of the reducers was stuck
> in the following stack trace:
> {noformat}
> Thread 12735: (state = IN_JAVA)
> -
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(int)
> @bci=22, line=245 (Compiled frame; information may be imprecise)
> - org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(int,
> byte[], int, int) @bci=18, line=150 (Interpreted frame)
> -
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.storeRowColumn(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
> int, int, boolean) @bci=536, line=442 (Compiled frame)
> -
> org.apache.hadoop.hive.ql.exec.vector.VectorDeserializeRow.deserialize(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,
> int) @bci=110, line=761 (Interpreted frame)
> -
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(org.apache.hadoop.io.BytesWritable,
> java.lang.Iterable, byte) @bci=184, line=444 (Interpreted frame)
> - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector()
> @bci=119, line=388 (Interpreted frame)
> - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord() @bci=8,
> line=239 (Interpreted frame)
> - org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run() @bci=124,
> line=319 (Interpreted frame)
> -
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(java.util.Map,
> java.util.Map) @bci=30, line=185 (Interpreted frame)
> - org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(java.util.Map,
> java.util.Map) @bci=159, line=168 (Interpreted frame)
> - org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run() @bci=65,
> line=370 (Interpreted frame)
> - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=133, line=73
> (Interpreted frame)
> - org.apache.tez.runtime.task.TaskRunner2Callable$1.run() @bci=1, line=61
> (Interpreted frame)
> -
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> java.security.AccessControlContext) @bci=0 (Compiled frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Interpreted frame)
> -
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> @bci=14, line=1724 (Interpreted frame)
> - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=38,
> line=61 (Interpreted frame)
> - org.apache.tez.runtime.task.TaskRunner2Callable.callInternal() @bci=1,
> line=37 (Interpreted frame)
> - org.apache.tez.common.CallableWithNdc.call() @bci=8, line=36 (Interpreted
> frame)
> - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Interpreted frame)
> -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Interpreted frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
> The reducer's input was 167 9MB binary values coming from the previous map
> job. Per [~gopalv] the BytesColumnVector is stuck trying to reallocate/copy
> all of these values into the same memory buffer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)