Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/754#discussion_r102382826
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java
---
@@ -57,6 +57,12 @@
private BatchSchema.SelectionVectorMode svMode =
BatchSchema.SelectionVectorMode.NONE;
private SelectionVector2 sv2;
+ /**
+ * Disk I/O buffer used for all reads and writes of DrillBufs.
+ */
+
+ private byte buffer[] = new byte[32*1024];
--- End diff --
The key issues with the location of the buffer are:
* Want to reuse the buffer as much as possible. (Hence, putting it on the
operator is a good idea.)
* Want to keep interfaces simple (passing the buffer from the operator to
everything that needs it is awkward.)
* Can only be shared by a single thread, obviously.
After playing around, it turns out we can move the read & write methods
onto the {{BufferAllocator}} class. This makes them available to anything that
uses Drillbufs. And, it allows the allocator to hold on to the (shared) I/O
buffer.
Reflecting on the sort, it becomes clear that such a change is necessary.
In the merge phase, the sort will have many spill runs open; each will have its
own {{VectorAccessibleSerializable}}, each with its own buffer. Moving the
buffer to the allocator reduces the needs to a single, shared, buffer.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---