One possible exception to the access pattern occurs when vectors wrap other vectors. Specifically, the offset vectors in Variable Length and Repeated vectors. These vectors are accessed and mutated multiple times. If we are going to implement strict enforcement, we need to consider that case.
On Tue, Aug 25, 2015 at 7:15 PM, Jacques Nadeau <[email protected]> wrote: > Yes, by recommendation is to correct the usage in StreamingAggBatch > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Tue, Aug 25, 2015 at 4:52 PM, Abdel Hakim Deneche < > [email protected]> > wrote: > > > I think zeroVector() is mainly used to fill the vector with zeros, which > is > > fine if you call it while the vector is in "mutate" state, but > > StreamingAggBatch does actually call it after setting the value count of > > the value vector which is against the paradigm. > > > > > > On Tue, Aug 25, 2015 at 3:51 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > In all but one situations, this is an internal concern (making sure to > > zero > > > out the memory). For fixed width vectors, there is an assumption that > an > > > initial allocation is clean memory (e.g. all zeros in the faces of an > int > > > vector). So this should be pulled off a public vector interface. The > > one > > > place where it is being used today is StreamingAggBatch and I think we > > > should fix that to follow the state paradigm described above. > > > > > > > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Tue, Aug 25, 2015 at 3:41 PM, Abdel Hakim Deneche < > > > [email protected]> > > > wrote: > > > > > > > Another question: FixedWidthVector interface defines a zeroVector() > > > method > > > > that > > > > "Zero out the underlying buffer backing this vector" according to > it's > > > > javadoc. > > > > > > > > Where does this method fit in the value vector states described > > earlier ? > > > > it doesn't clear the vector yet it doesn't reset everything to the > > after > > > > allocate state. > > > > > > > > On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche < > > > > [email protected] > > > > > wrote: > > > > > > > > > One more question about the transition from allocate -> mutate. For > > > Fixed > > > > > width vectors and BitVector you can actually call setSafe() without > > > > calling > > > > > allocateNew() first and it will work. Should it throw an exception > > > > instead > > > > > ? > > > > > not calling allocateNew() has side effects that could cause > setSafe() > > > to > > > > > throw an OversizedAllocationException if you call setSafe() then > > > clear() > > > > > multiple times. > > > > > > > > > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin < > > > [email protected]> > > > > > wrote: > > > > > > > > > >> Maybe we should start by putting these rules in a comment in the > > value > > > > >> vector base interfaces? The lack of such information is why there > > are > > > > >> deviations and other expectations. > > > > >> > > > > >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau < > [email protected] > > > > > > > >> wrote: > > > > >> > > > > >> > There are a few unspoken "rules" around vectors: > > > > >> > > > > > >> > - values need to be written in order (e.g. index 0, 1, 2, 5) > > > > >> > - null vectors start with all values as null before writing > > anything > > > > >> > - for variable width types, the offset vector should be all > zeros > > > > before > > > > >> > writing > > > > >> > - you must call setValueCount before a vector can be read > > > > >> > - you should never write to a vector once it has been read. > > > > >> > > > > > >> > The ultimate goal we should get to the point where you the > > > interfaces > > > > >> > guarantee this order of operation: > > > > >> > > > > > >> > allocate > mutate > setvaluecount > access > clear (or allocate > to > > > > start > > > > >> > the process over, xxx). Any deviation from this pattern should > > > result > > > > >> in > > > > >> > exception. We should do this only in debug mode as this code is > > > > >> extremely > > > > >> > performance sensitive. Operations like transfer should be built > > on > > > > top > > > > >> of > > > > >> > this state model. (In that case, it would mean src moves to > clear > > > > state > > > > >> > and target moves to access state. It also means that transfer > > > should > > > > >> only > > > > >> > work in access state.) > > > > >> > > > > > >> > If we need special purpose data structures that don't operate in > > > these > > > > >> > ways, we should make sure to keep them separate rather than > trying > > > to > > > > >> > accommodate a deviation from this pattern in the core vector > code. > > > > >> > > > > > >> > I wrote xxx above because I see the purpose of zeroVectors as > > being > > > a > > > > >> reset > > > > >> > on the vector state back to the original state. Maybe we should > > > > >> actually > > > > >> > call it 'reset' rather than 'zeroVectors'. This would basically > > > pick > > > > >> up at > > > > >> > mutate mode again. > > > > >> > > > > > >> > Since these rules were never formalized, I'm sure there are a > few > > > > places > > > > >> > where we currently deviate. We should enforce these rules and > > then > > > > get > > > > >> > those issues fixed. > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Jacques Nadeau > > > > >> > CTO and Co-Founder, Dremio > > > > >> > > > > > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > > > >> > [email protected]> > > > > >> > wrote: > > > > >> > > > > > >> > > Another important point to keep in mind here: > > > > >> ValueVectorWriteExpression > > > > >> > > operates under the "undocumented" assumption that the > > destination > > > > >> vector > > > > >> > is > > > > >> > > empty, this way it can safely skip writing null values. In the > > > case > > > > of > > > > >> > > window functions I am using a value vector as an internal > buffer > > > to > > > > >> hold > > > > >> > > values between batches which voids the assumption. > > > > >> > > If this assumption is indeed correct then adding zeroVector to > > > value > > > > >> > > vectors is indeed the way to go. > > > > >> > > > > > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau < > > > [email protected] > > > > > > > > > >> > > wrote: > > > > >> > > > > > > >> > > > In general, let's try to avoid extending the core structures > > > like > > > > >> value > > > > >> > > > vector read and write expressions for a single operator. > > > > Zerovector > > > > >> is > > > > >> > > > trivial to implement so let's resolve that way (trivial > since > > > the > > > > >> > > > underlying vector already has it and we just need to > delegate > > > > down). > > > > >> > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected] > > > > > > wrote: > > > > >> > > > > > > > >> > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value > > > window > > > > >> > > function > > > > >> > > > > incorrect result). His code uses > ValueVectorWriteExpression > > > to > > > > >> set > > > > >> > > > values > > > > >> > > > > in an internal batch which get re-used across different > > > > >> partitions of > > > > >> > > the > > > > >> > > > > window function. Ideally, we just want to zero out the > > vector > > > > >> rather > > > > >> > > > than > > > > >> > > > > calling clear() since clear() will release the buffer. > > > > >> > > > > > > > > >> > > > > However, currently zeroVectors() is only supported by > > > > >> > FixedWidthVector, > > > > >> > > > not > > > > >> > > > > VariableWidthVector. * Should there be such an interface > > for > > > > >> > variable > > > > >> > > > > width ? * The implementation could zero out just the > offset > > > > >> vector. > > > > >> > > > > > > > > >> > > > > In the absence of such an interface, Hakim has added a > > boolean > > > > >> flag > > > > >> > > > > witeNulls to ValueVectorWriteExpression (see > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > >> > > > > ) > > > > >> > > > > and is conditionally doing the null-ing out in the > > generated > > > > >> code. > > > > >> > It > > > > >> > > > > won't affect the normal code path, it would get used for > > > > specific > > > > >> > > window > > > > >> > > > > functions. > > > > >> > > > > > > > > >> > > > > I am thinking of committing his patch and tracking the > > > > >> zeroVectors() > > > > >> > > > > enhancement separately (if people agree that it would be > > > > useful). > > > > >> > Let > > > > >> > > me > > > > >> > > > > know... > > > > >> > > > > > > > > >> > > > > Aman > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > -- > > > > >> > > > > > > >> > > Abdelhakim Deneche > > > > >> > > > > > > >> > > Software Engineer > > > > >> > > > > > > >> > > <http://www.mapr.com/> > > > > >> > > > > > > >> > > > > > > >> > > Now Available - Free Hadoop On-Demand Training > > > > >> > > < > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > Software Engineer > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >
