kbendick commented on a change in pull request #1334:
URL: https://github.com/apache/iceberg/pull/1334#discussion_r471895373
##########
File path: data/src/main/java/org/apache/iceberg/data/orc/GenericOrcWriters.java
##########
@@ -231,7 +231,33 @@ public void nonNullWrite(int rowId, String data,
ColumnVector output) {
@Override
public void nonNullWrite(int rowId, ByteBuffer data, ColumnVector output) {
- ((BytesColumnVector) output).setRef(rowId, data.array(), 0,
data.array().length);
+ // Don't assume the we're reading in the entire backing array.
+ //
+ // We use the same logic for on heap vs off heap buffers as we don't
assume
+ // that the position of the byte buffer received for the write is at the
beginning
+ // position, such an assumption is needed to make use of methods like
`.slice()`
+ // or any methods that would be useful here if we checked if there is an
on-heap
+ // backing array and that the buffer is at the beginning position. (aka
we don't check
+ // that one can use if `data.hasArray()` is true as we couldn't make any
optimizations
+ // in that case).
+ int position = data.position();
+ int limit = data.limit();
+ int curIndex = data.arrayOffset() + data.position();
+ int endIndex = curIndex + data.remaining();
+
+ // Prep for copy into bytes
+ byte[] bytes = new byte[data.remaining()];
+ data.limit(curIndex + limit);
+ data.position(curIndex);
+
+ // Perform copy into bytes of remainder of byte buffer.
+ data.get(bytes, curIndex, endIndex - curIndex);
+
+ // Reset the byte buffer.
+ data.limit(limit);
+ data.position(position);
Review comment:
I took the logic for copying from following
`org.apache.iceberg.io.ParquetValueWriters` into
`org.apache.parquet.io.api.Binary`, specifically the need to update both the
limit and the position in order to return the original ByteBuffer, `data`, back
to its original position as `ByteBuffer` offers no API to to that.
Since I was no longer going to use any optimizations if the buffer is on
heap or not (such as the usage of `.slice()` or `.duplicate()` or `.array()` I
chose to skip entirely the check for `data.hasArray()`.
I somewhat copied emulated the logic to copy and reset the original byte
buffer `data`'s position (which could be invalidated by moving the position
passed the limit), as demonstrated in `org.apache.parquet.io.api.Binary`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]