There are a few unspoken "rules" around vectors: - values need to be written in order (e.g. index 0, 1, 2, 5) - null vectors start with all values as null before writing anything - for variable width types, the offset vector should be all zeros before writing - you must call setValueCount before a vector can be read - you should never write to a vector once it has been read.
The ultimate goal we should get to the point where you the interfaces guarantee this order of operation: allocate > mutate > setvaluecount > access > clear (or allocate to start the process over, xxx). Any deviation from this pattern should result in exception. We should do this only in debug mode as this code is extremely performance sensitive. Operations like transfer should be built on top of this state model. (In that case, it would mean src moves to clear state and target moves to access state. It also means that transfer should only work in access state.) If we need special purpose data structures that don't operate in these ways, we should make sure to keep them separate rather than trying to accommodate a deviation from this pattern in the core vector code. I wrote xxx above because I see the purpose of zeroVectors as being a reset on the vector state back to the original state. Maybe we should actually call it 'reset' rather than 'zeroVectors'. This would basically pick up at mutate mode again. Since these rules were never formalized, I'm sure there are a few places where we currently deviate. We should enforce these rules and then get those issues fixed. -- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche <[email protected]> wrote: > Another important point to keep in mind here: ValueVectorWriteExpression > operates under the "undocumented" assumption that the destination vector is > empty, this way it can safely skip writing null values. In the case of > window functions I am using a value vector as an internal buffer to hold > values between batches which voids the assumption. > If this assumption is indeed correct then adding zeroVector to value > vectors is indeed the way to go. > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <[email protected]> > wrote: > > > In general, let's try to avoid extending the core structures like value > > vector read and write expressions for a single operator. Zerovector is > > trivial to implement so let's resolve that way (trivial since the > > underlying vector already has it and we just need to delegate down). > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected]> wrote: > > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window > function > > > incorrect result). His code uses ValueVectorWriteExpression to set > > values > > > in an internal batch which get re-used across different partitions of > the > > > window function. Ideally, we just want to zero out the vector rather > > than > > > calling clear() since clear() will release the buffer. > > > > > > However, currently zeroVectors() is only supported by FixedWidthVector, > > not > > > VariableWidthVector. * Should there be such an interface for variable > > > width ? * The implementation could zero out just the offset vector. > > > > > > In the absence of such an interface, Hakim has added a boolean flag > > > witeNulls to ValueVectorWriteExpression (see > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > ) > > > and is conditionally doing the null-ing out in the generated code. It > > > won't affect the normal code path, it would get used for specific > window > > > functions. > > > > > > I am thinking of committing his patch and tracking the zeroVectors() > > > enhancement separately (if people agree that it would be useful). Let > me > > > know... > > > > > > Aman > > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
