Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/781#discussion_r106491214
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java
---
@@ -502,11 +502,15 @@ public void setSafe(int index, byte[] bytes) {
assert index >= 0;
final int currentOffset = offsetVector.getAccessor().get(index);
- while (data.capacity() < currentOffset + bytes.length) {
- reAlloc();
- }
offsetVector.getMutator().setSafe(index + 1, currentOffset +
bytes.length);
- data.setBytes(currentOffset, bytes, 0, bytes.length);
+ try {
+ data.setBytes(currentOffset, bytes, 0, bytes.length);
+ } catch (IndexOutOfBoundsException e) {
--- End diff --
While this is a clever way to avoid checks, it will lead to difficulty when
debugging Drill. Intentionally throwing a common exception makes it even harder
to find cases where the exception indicates an error.
Let's take a step back. One of the things we need to change in Parquet is
to avoid "low density batches" vectors that have very little data. Turns out
one reason is tied up with the assumption that the code makes that it can tell
when it has reached the end of a vector. (There are many bugs, but that is the
key idea.)
Vectors don't have that ability today, so the code never worked.
What if we solve that problem, and yours, by changing how the DrillBuf
works:
```
if ( ! data.setBytesIfCapacity(...)) {
reAlloc();
data.setBytes(...)
}
```
The above avoids the spurious exception *and* provides the means to manage
variable-length vectors in Parquet.
Note that the bounds check is still done, but only inside Drillbuf. And, of
course, that same check is done with the PR code: that check is what raises the
exception.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---