kbendick commented on a change in pull request #1334:
URL: https://github.com/apache/iceberg/pull/1334#discussion_r471068779
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -231,7 +231,23 @@ public void nonNullWrite(int rowId, String data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, ByteBuffer data, ColumnVector output) {
- ((BytesColumnVector) output).setRef(rowId, data.array(), 0,
data.array().length);
+ // When there is a backing heap based byte array, we avoid the overhead
of
+ // copying it.
+ if (data.hasArray()) {
+ // Don't assume the we're reading in the entire backing array.
+ // Using slice as any mutations to the data at rowId of output
+ // would also be visible in `data`.
+ ByteBuffer slice = data.slice();
+ ((BytesColumnVector) output).setRef(rowId, slice.array(), 0,
slice.array().length);
Review comment:
I updated this line to not assume that the entire backing array of
`data` is what we're reading in. Via the use of slice, a shallow copy takes
place (which mimics the previous shallow copy). Does this resolve your previous
concern?
I can clean up the comments etc once I know if this resolves any on going
issues or not @rdblue . Thanks so much for your thorough review!
##########
File path: pig/src/main/java/org/apache/iceberg/pig/IcebergPigInputFormat.java
##########
@@ -246,7 +246,11 @@ private boolean advance() throws IOException {
private Object convertPartitionValue(Type type, Object value) {
if (type.typeId() == Types.BinaryType.get().typeId()) {
ByteBuffer buffer = (ByteBuffer) value;
- return new DataByteArray(buffer.get(new
byte[buffer.remaining()]).array());
+ byte[] bytes = new byte[buffer.remaining()];
+ buffer.get(bytes);
+ // Return the input buffer back to its original position.
+ buffer.position(buffer.position() - bytes.length);
+ return new DataByteArray(bytes);
Review comment:
I updated this one as you requested @rdblue, however I wasn't sure if I
should return the input `value` to its original position (vs consuming the
remainder of the byte buffer and moving the position to the end).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]