kbendick commented on a change in pull request #1334:
URL: https://github.com/apache/iceberg/pull/1334#discussion_r471172748
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -231,7 +231,23 @@ public void nonNullWrite(int rowId, String data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, ByteBuffer data, ColumnVector output) {
- ((BytesColumnVector) output).setRef(rowId, data.array(), 0,
data.array().length);
+ // When there is a backing heap based byte array, we avoid the overhead
of
+ // copying it.
+ if (data.hasArray()) {
+ // Don't assume the we're reading in the entire backing array.
+ // Using slice as any mutations to the data at rowId of output
+ // would also be visible in `data`.
+ ByteBuffer slice = data.slice();
+ ((BytesColumnVector) output).setRef(rowId, slice.array(), 0,
slice.array().length);
Review comment:
Actually, my solution with slice causes a ByteBufferBackingArray
warning, which could be suppressed as we called `.hasArray` before. But at this
point suppression seems silly given the work that went in to correcting this.
If anybody has any solutions which doesn't involve a deep copy, I'm open to
suggestions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]