jacques-n commented on a change in pull request #6402:
URL: https://github.com/apache/arrow/pull/6402#discussion_r439003304
##########
File path:
java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java
##########
@@ -740,10 +740,16 @@ private void splitAndTransferOffsetBuffer(int startIndex,
int length, BaseVariab
final int start = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
final int end = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH);
final int dataLength = end - start;
- target.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
- for (int i = 0; i < length + 1; i++) {
- final int relativeSourceOffset = offsetBuffer.getInt((startIndex + i) *
OFFSET_WIDTH) - start;
- target.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeSourceOffset);
+
+ if (startIndex == 0) {
+ target.offsetBuffer = offsetBuffer.slice(0, (1 + length) * OFFSET_WIDTH);
+ target.offsetBuffer.getReferenceManager().retain();
Review comment:
splitAndTransfer is trying to do best effort transfer of data, not copy.
However, for some situations some allocations have to occur. This is only true
for split and transfer. Normal transfers never have to copy data and only about
ownership.
I can't actually get to the commit these comments are on to confirm the
specific details. It makes sense that a split and transfer that starts at the
same initial offset doesn't need to rewrite the offsets and therefore would do
a transfer. A clearer/more complete optimization would probably to simply read
the initial starting offset at the index that we're slicing from and if that is
zero, don't do the allocate/copy. In that case, if you have a bunch of empty
strings or null values, you still can do a split and transfer with zero
copy/allocation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]