[
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256589#comment-16256589
]
ASF GitHub Bot commented on DRILL-5657:
---------------------------------------
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/914
Regarding the use of memory addresses. The only reason to do so is
performance. To show the benefit of using addresses, I reran the
`PerformanceTool` class to test the original code, the code using addresses,
and a version that uses DrillBuf as @parthchandra suggested. I expected to see
that using addresses was a winner. That's not at all what happened.
The code contains a class, `PerformanceTool` that compares the column
writers with the original vector mutators. It loads a vector to 16 MB in size,
repeated 300 times. The following are the run times, in ms.
Vector Type | Original | New w/Address | New w/Drillbuf
------------ | -------- | ------------ | -------------
Required | 5703 | 4034 | 1461
Nullable | 12743 | 3645 | 3411
Repeated | 20430 | 7226 | 2669
Here:
* "Original" column uses the original int vector mutator class.
* "New w/Address" shows the same exercise, using the version of the vector
writers based on a direct memory address.
* "New w/Drillbuf" shows the vector writers, but using the technique Parth
suggested to create "unsafe" methods on the `Drillbuf` class.
The test is run with a pre-allocated vector (no double-and-copy
operations). See `PerformanceTool` for details.
I have no explanation for why the `Drillbuf` version should be faster at
all, let alone far faster; but I'll take it. The latest commit contains the
code after this revision.
So, thank you Parth, you were right again with what turned out to be an
outstanding performance boost.
> Implement size-aware result set loader
> --------------------------------------
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: Future
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set"
> abstraction to allow us to create, and verify, record batches with very few
> lines of code. Part of this work involved creating a set of "column
> accessors" in the vector subsystem. Column readers provide a uniform API to
> obtain data from columns (vectors), while column writers provide a uniform
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size
> (to avoid memory fragmentation due to Drill's two memory allocators.) The
> column accessors have proven to be so useful that they will be the basis for
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware
> vector writing, including the case in which a vector fills in the middle of a
> row.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)