kbendick commented on a change in pull request #1334:
URL: https://github.com/apache/iceberg/pull/1334#discussion_r472779129
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -231,7 +231,23 @@ public void nonNullWrite(int rowId, String data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, ByteBuffer data, ColumnVector output) {
- ((BytesColumnVector) output).setRef(rowId, data.array(), 0,
data.array().length);
+ // When there is a backing heap based byte array, we avoid the overhead
of
+ // copying it.
+ if (data.hasArray()) {
+ // Don't assume the we're reading in the entire backing array.
+ // Using slice as any mutations to the data at rowId of output
+ // would also be visible in `data`.
+ ByteBuffer slice = data.slice();
+ ((BytesColumnVector) output).setRef(rowId, slice.array(), 0,
slice.array().length);
+ } else {
+ // Consume the remaining contents of the input data
+ byte[] bytes = new byte[data.remaining()];
+ data.get(bytes);
+ // Restores the buffer position
+ // TODO - Is this necessary?
+ data.position(data.position() - bytes.length);
Review comment:
Ok. I will prefer `duplicate` here. I'll still need to use a
Suppression, but it will be the correct solution if I understand correctly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]