Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/887
This set of commits provide performance tuning of the column writer
implementation. Discussion during the code review identified a number of tuning
opportunities that lead to this work. The commits divide the files by layer:
* Value vectors
* Column accessors
* Row set classes used for testing
* Secondary files affected by the above changes
The result is a very nice performance improvement relative to the original
vector mutator methods. The writers now provide:
* 42% faster writes for required values
* 73% faster writes for nullable values
* 408% (4x) faster writes for repeated values
Not a bad outcome from a casual code review conversation!
### Merge Vector Overflow into Column Writers
DRILL-5688 and DRILL-5657 originally separated the tasks of detecting and
handling vector overflow. (Vector overflow occurs when we are about to write
past the maximum buffer size.) The result was that each âsetâ operation
needed to traverse two layers: the âcolumn loaderâ which handles overflow,
and the âcolumn writerâ which writes the value to a vector.
Discussion identified an opportunity to combine these into a single layer.
The revised column accessors in this PR achieve that revision: each accessor
detects overflow, call a handler to handle the overflow, then continues to
write operation into the (presumably) new vector.
The âresult set loaderâ (DRILL-5657) abstraction contains the logic to
handle overflows. The original ârow setâ abstractions for tests donât
handle overflow; instead they throw a index out-of-bounds exception. (Tests
seldom write enough data to trigger an overflow unless deliberately testing the
overflow feature.)
### Bypass the Vector Mutator
The big performance improvement was to avoid the overhead of the original
vector mutator. In the original code, each write travels through many layers of
code:
* Vector mutator
* Drillbuf
* UDLE
* Netty platform dependent
* Java Unsafe
The revised column writers bypass the first three layers; and instead use
the Netty platform dependent methods to write to vector memory. In this, they
follow the text reader implementation which does something similar.
### Specialized Offset Vector Writer
For arrays (repeated) and variable-width values, the original code is quite
inefficient as it must do a direct memory read, and two writes, for each value.
The revised writer code for variable-width fields does two writes. For arrays
(repeated) values, the writer code does just one write per value and one write
per row. (This is why the writer code is so much faster: the test writes five
values per row: 15 direct memory I/Os in the original code, just 6 in the new
writer code.)
A new `OffsetVectorWriter` class manages the details of writing to the
offset vector.
### Highly-Optimized Writer Code
The unfortunate side-effect of the optimization is that the writers are no
longer simple. The previous version (which should be reviewed first) delegate
work to the vector mutator. The new version provides an integrated set of code
that:
* Does one single bounds check per value to
* Check if the vector size must increase, and if so
* Check if the growth would cause overflow
* Also check if the âfill emptiesâ logic is needed to fill in missing
values
On the other hand, it became clear that the writers could be simplified.
The previous version, because it worked with the mutators in each vector class,
needed a separate writer for required, optional and repeated vectors. The new
code, because it works directly with Nettyâs platform dependent layer, can
use a single writer for all three data modes.
Now, one might note that the single implementation has some very slight
overhead:
* The nullable vector need not fill empty âdataâ values (null values
can be garbage, empty filling is really only needed for the âbitsâ vector)
* Repeated writers need not fill empty values (there may be empties in the
offset vector, but not the data vector.)
However, the cost is a single comparison, so it did not seem worth the
effort to generate three complex classes to avoid a single comparison. (The
performance test results seem to back up this assumption.)
Because of the above, the `ColumnAccessors.java` template has become a bit
more complex (as if it was not already plenty complex.) Best is to look at the
generated `ColumnAccessors` class, and its nested classes, to see the result of
the template.
### Non-zeroing Vector Reallocation
The existing `reAlloc` methods double vector size and zero-fill the upper
half of the new vector. This causes the `setSafe()` methods to use loops to
call `reAlloc()` multiple times in case the vector may need to quadruple or
more.
The revised column writer code avoids these two performance penalties by
* Computing the required new vector size so it can be allocated in one go
and
* Skipping the zeroing step.
Existing code continues to use the original implementation.
### Collateral Code Cleanup
Because vector overflow is now integrated into the writers, there is no
longer a need to throw the prior `VectorOverflowException`, which is now
removed. The code changes in prior commits to handle this exception are removed
in this commit.
DRILL-5517 added âsize awareâ methods to value vectors to support the
earlier version of the writers which used the vector mutators. Since that logic
is now integrated into the writers themselves, this commit backs out those
changes to tidy up the value vector code.
Previous versions of the row set writers flattened maps for easier creation
of test data in the `RowSetBuilder`. Earlier commits changed this behavior to
require an array of values to populate a map. A number of tests needed updating
as a result.
Retrofitted a couple of tests to use the `SubOperatorTest` base class.
Split the `MapWriter` into two distinct base classes so that the code can
set the value count which is implemented in two different classes without a
common parent class method definition.
Earlier versions had a way to set every column type from an int value for
testing. Moved this code around to make use of the generic object-based set
methods now available.
Added a `PerformanceTool` used to compare writer performance to the
original vector & mutator implementations.
Added a `TestFillEmpties` test which writes every fifth value to every data
type and every mode to ensure that the âfill emptiesâ logic works for all.
This also has the handy side-effect of verifying null handling and basic scalar
writer behavior.
### Readers
A logical question is, âif the column writers benefited from
optimization, what about the column readers?â Writers are the focus of the
âmemory fragmentationâ project which will replace the low-level vector
writers in various operators.
For now, readers are only used in test code - which is not performance
critical. We can apply the same technique to readers when they find themselves
on the critical performance path.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---