I think it would be worthwhile to research when this was added and why. I took a quick glance but hadn't figured out why this was introduced. At first glance, it seems like it was done to fix what is most likely a bug elsewhere.
@Chris, agreed. The first goal was to at least get them documented in this thread. -- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche <[email protected] > wrote: > One more question about the transition from allocate -> mutate. For Fixed > width vectors and BitVector you can actually call setSafe() without calling > allocateNew() first and it will work. Should it throw an exception instead > ? > not calling allocateNew() has side effects that could cause setSafe() to > throw an OversizedAllocationException if you call setSafe() then clear() > multiple times. > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <[email protected]> > wrote: > > > Maybe we should start by putting these rules in a comment in the value > > vector base interfaces? The lack of such information is why there are > > deviations and other expectations. > > > > On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <[email protected]> > > wrote: > > > > > There are a few unspoken "rules" around vectors: > > > > > > - values need to be written in order (e.g. index 0, 1, 2, 5) > > > - null vectors start with all values as null before writing anything > > > - for variable width types, the offset vector should be all zeros > before > > > writing > > > - you must call setValueCount before a vector can be read > > > - you should never write to a vector once it has been read. > > > > > > The ultimate goal we should get to the point where you the interfaces > > > guarantee this order of operation: > > > > > > allocate > mutate > setvaluecount > access > clear (or allocate to > start > > > the process over, xxx). Any deviation from this pattern should result > in > > > exception. We should do this only in debug mode as this code is > > extremely > > > performance sensitive. Operations like transfer should be built on top > > of > > > this state model. (In that case, it would mean src moves to clear > state > > > and target moves to access state. It also means that transfer should > > only > > > work in access state.) > > > > > > If we need special purpose data structures that don't operate in these > > > ways, we should make sure to keep them separate rather than trying to > > > accommodate a deviation from this pattern in the core vector code. > > > > > > I wrote xxx above because I see the purpose of zeroVectors as being a > > reset > > > on the vector state back to the original state. Maybe we should > actually > > > call it 'reset' rather than 'zeroVectors'. This would basically pick > up > > at > > > mutate mode again. > > > > > > Since these rules were never formalized, I'm sure there are a few > places > > > where we currently deviate. We should enforce these rules and then get > > > those issues fixed. > > > > > > > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > > [email protected]> > > > wrote: > > > > > > > Another important point to keep in mind here: > > ValueVectorWriteExpression > > > > operates under the "undocumented" assumption that the destination > > vector > > > is > > > > empty, this way it can safely skip writing null values. In the case > of > > > > window functions I am using a value vector as an internal buffer to > > hold > > > > values between batches which voids the assumption. > > > > If this assumption is indeed correct then adding zeroVector to value > > > > vectors is indeed the way to go. > > > > > > > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <[email protected]> > > > > wrote: > > > > > > > > > In general, let's try to avoid extending the core structures like > > value > > > > > vector read and write expressions for a single operator. Zerovector > > is > > > > > trivial to implement so let's resolve that way (trivial since the > > > > > underlying vector already has it and we just need to delegate > down). > > > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected]> > wrote: > > > > > > > > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window > > > > function > > > > > > incorrect result). His code uses ValueVectorWriteExpression to > set > > > > > values > > > > > > in an internal batch which get re-used across different > partitions > > of > > > > the > > > > > > window function. Ideally, we just want to zero out the vector > > rather > > > > > than > > > > > > calling clear() since clear() will release the buffer. > > > > > > > > > > > > However, currently zeroVectors() is only supported by > > > FixedWidthVector, > > > > > not > > > > > > VariableWidthVector. * Should there be such an interface for > > > variable > > > > > > width ? * The implementation could zero out just the offset > vector. > > > > > > > > > > > > In the absence of such an interface, Hakim has added a boolean > flag > > > > > > witeNulls to ValueVectorWriteExpression (see > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > > > ) > > > > > > and is conditionally doing the null-ing out in the generated > code. > > > It > > > > > > won't affect the normal code path, it would get used for specific > > > > window > > > > > > functions. > > > > > > > > > > > > I am thinking of committing his patch and tracking the > > zeroVectors() > > > > > > enhancement separately (if people agree that it would be useful). > > > Let > > > > me > > > > > > know... > > > > > > > > > > > > Aman > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
