[
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shubham Chaurasia reassigned HIVE-23034:
----------------------------------------
> Arrow serializer should not keep the reference of arrow offset and validity
> buffers
> -----------------------------------------------------------------------------------
>
> Key: HIVE-23034
> URL: https://issues.apache.org/jira/browse/HIVE-23034
> Project: Hive
> Issue Type: Bug
> Components: llap, Serializers/Deserializers
> Reporter: Shubham Chaurasia
> Assignee: Shubham Chaurasia
> Priority: Major
>
> Currently, a part of writeList() method in arrow serializer is implemented
> like -
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
> int nextOffset = 0;
> for (int rowIndex = 0; rowIndex < size; rowIndex++) {
> int selectedIndex = rowIndex;
> if (vectorizedRowBatch.selectedInUse) {
> selectedIndex = vectorizedRowBatch.selected[rowIndex];
> }
> if (hiveVector.isNull[selectedIndex]) {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> } else {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> nextOffset += (int) hiveVector.lengths[selectedIndex];
> arrowVector.setNotNull(rowIndex);
> }
> }
> offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer =
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and
> offset vector.
> Problem -
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates
> the offset and validity buffers when a threshold is crossed, updates the
> references internally and also releases the old buffers (which decrements the
> buffer reference count). Now the reference which we obtained in 1) becomes
> obsolete. Furthermore if try to read or write old buffer, we see -
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
> at
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
> at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
> at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
> at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>
> Solution -
> This can be fixed by getting the buffers each time (
> {{arrowVector.getOffsetBuffer()}} ) we want to update them.
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same
> thing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)