Github user parthchandra commented on a diff in the pull request:
https://github.com/apache/drill/pull/781#discussion_r106792968
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java
---
@@ -502,11 +502,15 @@ public void setSafe(int index, byte[] bytes) {
assert index >= 0;
final int currentOffset = offsetVector.getAccessor().get(index);
- while (data.capacity() < currentOffset + bytes.length) {
- reAlloc();
- }
offsetVector.getMutator().setSafe(index + 1, currentOffset +
bytes.length);
- data.setBytes(currentOffset, bytes, 0, bytes.length);
+ try {
+ data.setBytes(currentOffset, bytes, 0, bytes.length);
+ } catch (IndexOutOfBoundsException e) {
--- End diff --
The check for a DrillBuf exceeding bounds is already being done once in
Netty (which also throws the Exception). What we want to do is to avoid having
to do more than one bounds check for every vector write operation. By adding
the suggested check, we simply move the check from every vector setSafe method
to every Drillbuf write. This would possibly impact performance in other parts
of the code.
I particularly changed only var len vectors to address a hotspot in the
Parquet reader performance, because as you are suggesting, the right way to fix
is to address the low density batch problem at the same time as the vector
overflow problem. Perhaps a longer discussion may be required here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---