Maybe we should start by putting these rules in a comment in the value
vector base interfaces? The lack of such information is why there are
deviations and other expectations.

On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <[email protected]> wrote:

> There are a few unspoken "rules" around vectors:
>
> - values need to be written in order (e.g. index 0, 1, 2, 5)
> - null vectors start with all values as null before writing anything
> - for variable width types, the offset vector should be all zeros before
> writing
> - you must call setValueCount before a vector can be read
> - you should never write to a vector once it has been read.
>
> The ultimate goal we should get to the point where you the interfaces
> guarantee this order of operation:
>
> allocate > mutate > setvaluecount > access > clear (or allocate to start
> the process over, xxx).  Any deviation from this pattern should result in
> exception.  We should do this only in debug mode as this code is extremely
> performance sensitive.  Operations like transfer should be built on top of
> this state model.  (In that case, it would mean src moves to clear state
> and target moves to access state.  It also means that transfer should only
> work in access state.)
>
> If we need special purpose data structures that don't operate in these
> ways, we should make sure to keep them separate rather than trying to
> accommodate a deviation from this pattern in the core vector code.
>
> I wrote xxx above because I see the purpose of zeroVectors as being a reset
> on the vector state back to the original state.  Maybe we should actually
> call it 'reset' rather than 'zeroVectors'.  This would basically pick up at
> mutate mode again.
>
> Since these rules were never formalized, I'm sure there are a few places
> where we currently deviate.  We should enforce these rules and then get
> those issues fixed.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche <
> [email protected]>
> wrote:
>
> > Another important point to keep in mind here: ValueVectorWriteExpression
> > operates under the "undocumented" assumption that the destination vector
> is
> > empty, this way it can safely skip writing null values. In the case of
> > window functions I am using a value vector as an internal buffer to hold
> > values between batches which voids the assumption.
> > If this assumption is indeed correct then adding zeroVector to value
> > vectors is indeed the way to go.
> >
> > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > In general, let's try to avoid extending the core structures like value
> > > vector read and write expressions for a single operator. Zerovector is
> > > trivial to implement so let's resolve that way (trivial since the
> > > underlying vector already has it and we just need to delegate down).
> > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected]> wrote:
> > >
> > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window
> > function
> > > > incorrect result).  His code uses ValueVectorWriteExpression to set
> > > values
> > > > in an internal batch which get re-used across different partitions of
> > the
> > > > window function.  Ideally, we just want to zero out the vector rather
> > > than
> > > > calling clear() since clear() will release the buffer.
> > > >
> > > > However, currently zeroVectors() is only supported by
> FixedWidthVector,
> > > not
> > > > VariableWidthVector.  * Should there be such an interface for
> variable
> > > > width ? * The implementation could zero out just the offset vector.
> > > >
> > > > In the absence of such an interface, Hakim has added a boolean flag
> > > > witeNulls to ValueVectorWriteExpression (see
> > > >
> > > >
> > >
> >
> https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d
> > > > )
> > > >  and is conditionally doing the null-ing out in the generated code.
> It
> > > > won't affect the normal code path, it would get used for specific
> > window
> > > > functions.
> > > >
> > > > I am thinking of committing his patch and tracking the zeroVectors()
> > > > enhancement separately (if people agree that it would be useful).
> Let
> > me
> > > > know...
> > > >
> > > > Aman
> > > >
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Reply via email to