[
https://issues.apache.org/jira/browse/DRILL-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424957#comment-16424957
]
ASF GitHub Bot commented on DRILL-6202:
---------------------------------------
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/1144
My two cents... DrillBuf is the only memory-level abstraction that (low
level) Drill code should reference. The UDLE and other bits should be fully
encapsulated. This guideline lets us evolve the representation if we ever need
to do so.
The original design appeared to be that value vectors would be the primary
interface to memory. But, a great many issues made that difficult, not least of
which is that vector access methods are heavily typed, resulting in far too
much casting. Also, the mutator methods try to do the full operation, leading
to inefficiency (especially around VarChars).
A more general rule is that application code should work with vectors until
they can migrate to working with the result set loader or reader. (We should
probably call these the row set emitter and collector to be more
Hadoop-like...) The higher-level abstractions handle the grunt work currently
spread throughout operators.
(And, to answer a prior question: we want to use the row set abstractions
so we have a uniform way to write to vectors, to control batch size, to handle
schema issues and so on on write. And, to have a standard way to handle
indirection vectors and vector navigation on read.)
Ideally only, the vector mutator or row set loader implementation works
with DrillBuf to do actual data reads and writes. In an early version, the row
set loader code used `PlatformDependent` to avoid bounds checks. But, with
@vrozov's improvements, doing so became unnecessary -- a nice improvement.
Still, bounds checks should be done during tests: it is handy to work with
a safety net.
Since bounds checks are optional (turned off in production), then the
changes here make good sense: no code should count on bounds checks from the
"unchecked" methods for the simple reason that the checks are normally off.
That said, if there is a reason to have "checked" access, we could provide
such methods. Those methods would throw the `IndexOutOfBoundsException`. That
is, the checked methods would recreate the original "get/set" methods prior to
@vrozov's improvements. I can't think of a reason to do that off the top of my
head, but someone might present a valid use case.
> Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
> ----------------------------------------------------------------
>
> Key: DRILL-6202
> URL: https://issues.apache.org/jira/browse/DRILL-6202
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Vlad Rozov
> Assignee: Vlad Rozov
> Priority: Major
> Fix For: 1.14.0
>
>
> As bounds checking may be enabled or disabled, using
> IndexOutOfBoundsException to resize vectors is unreliable. It works only when
> bounds checking is enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)