I can take a look at the Vectors and add asserts to enforce the contract is respected.
On Wed, Aug 26, 2015 at 2:52 PM, Steven Phillips <[email protected]> wrote: > One possible exception to the access pattern occurs when vectors wrap other > vectors. Specifically, the offset vectors in Variable Length and Repeated > vectors. These vectors are accessed and mutated multiple times. If we are > going to implement strict enforcement, we need to consider that case. > > On Tue, Aug 25, 2015 at 7:15 PM, Jacques Nadeau <[email protected]> > wrote: > > > Yes, by recommendation is to correct the usage in StreamingAggBatch > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Tue, Aug 25, 2015 at 4:52 PM, Abdel Hakim Deneche < > > [email protected]> > > wrote: > > > > > I think zeroVector() is mainly used to fill the vector with zeros, > which > > is > > > fine if you call it while the vector is in "mutate" state, but > > > StreamingAggBatch does actually call it after setting the value count > of > > > the value vector which is against the paradigm. > > > > > > > > > On Tue, Aug 25, 2015 at 3:51 PM, Jacques Nadeau <[email protected]> > > > wrote: > > > > > > > In all but one situations, this is an internal concern (making sure > to > > > zero > > > > out the memory). For fixed width vectors, there is an assumption > that > > an > > > > initial allocation is clean memory (e.g. all zeros in the faces of an > > int > > > > vector). So this should be pulled off a public vector interface. > The > > > one > > > > place where it is being used today is StreamingAggBatch and I think > we > > > > should fix that to follow the state paradigm described above. > > > > > > > > > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Tue, Aug 25, 2015 at 3:41 PM, Abdel Hakim Deneche < > > > > [email protected]> > > > > wrote: > > > > > > > > > Another question: FixedWidthVector interface defines a zeroVector() > > > > method > > > > > that > > > > > "Zero out the underlying buffer backing this vector" according to > > it's > > > > > javadoc. > > > > > > > > > > Where does this method fit in the value vector states described > > > earlier ? > > > > > it doesn't clear the vector yet it doesn't reset everything to the > > > after > > > > > allocate state. > > > > > > > > > > On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche < > > > > > [email protected] > > > > > > wrote: > > > > > > > > > > > One more question about the transition from allocate -> mutate. > For > > > > Fixed > > > > > > width vectors and BitVector you can actually call setSafe() > without > > > > > calling > > > > > > allocateNew() first and it will work. Should it throw an > exception > > > > > instead > > > > > > ? > > > > > > not calling allocateNew() has side effects that could cause > > setSafe() > > > > to > > > > > > throw an OversizedAllocationException if you call setSafe() then > > > > clear() > > > > > > multiple times. > > > > > > > > > > > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin < > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> Maybe we should start by putting these rules in a comment in the > > > value > > > > > >> vector base interfaces? The lack of such information is why > there > > > are > > > > > >> deviations and other expectations. > > > > > >> > > > > > >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau < > > [email protected] > > > > > > > > > >> wrote: > > > > > >> > > > > > >> > There are a few unspoken "rules" around vectors: > > > > > >> > > > > > > >> > - values need to be written in order (e.g. index 0, 1, 2, 5) > > > > > >> > - null vectors start with all values as null before writing > > > anything > > > > > >> > - for variable width types, the offset vector should be all > > zeros > > > > > before > > > > > >> > writing > > > > > >> > - you must call setValueCount before a vector can be read > > > > > >> > - you should never write to a vector once it has been read. > > > > > >> > > > > > > >> > The ultimate goal we should get to the point where you the > > > > interfaces > > > > > >> > guarantee this order of operation: > > > > > >> > > > > > > >> > allocate > mutate > setvaluecount > access > clear (or > allocate > > to > > > > > start > > > > > >> > the process over, xxx). Any deviation from this pattern > should > > > > result > > > > > >> in > > > > > >> > exception. We should do this only in debug mode as this code > is > > > > > >> extremely > > > > > >> > performance sensitive. Operations like transfer should be > built > > > on > > > > > top > > > > > >> of > > > > > >> > this state model. (In that case, it would mean src moves to > > clear > > > > > state > > > > > >> > and target moves to access state. It also means that transfer > > > > should > > > > > >> only > > > > > >> > work in access state.) > > > > > >> > > > > > > >> > If we need special purpose data structures that don't operate > in > > > > these > > > > > >> > ways, we should make sure to keep them separate rather than > > trying > > > > to > > > > > >> > accommodate a deviation from this pattern in the core vector > > code. > > > > > >> > > > > > > >> > I wrote xxx above because I see the purpose of zeroVectors as > > > being > > > > a > > > > > >> reset > > > > > >> > on the vector state back to the original state. Maybe we > should > > > > > >> actually > > > > > >> > call it 'reset' rather than 'zeroVectors'. This would > basically > > > > pick > > > > > >> up at > > > > > >> > mutate mode again. > > > > > >> > > > > > > >> > Since these rules were never formalized, I'm sure there are a > > few > > > > > places > > > > > >> > where we currently deviate. We should enforce these rules and > > > then > > > > > get > > > > > >> > those issues fixed. > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > -- > > > > > >> > Jacques Nadeau > > > > > >> > CTO and Co-Founder, Dremio > > > > > >> > > > > > > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > > > > >> > [email protected]> > > > > > >> > wrote: > > > > > >> > > > > > > >> > > Another important point to keep in mind here: > > > > > >> ValueVectorWriteExpression > > > > > >> > > operates under the "undocumented" assumption that the > > > destination > > > > > >> vector > > > > > >> > is > > > > > >> > > empty, this way it can safely skip writing null values. In > the > > > > case > > > > > of > > > > > >> > > window functions I am using a value vector as an internal > > buffer > > > > to > > > > > >> hold > > > > > >> > > values between batches which voids the assumption. > > > > > >> > > If this assumption is indeed correct then adding zeroVector > to > > > > value > > > > > >> > > vectors is indeed the way to go. > > > > > >> > > > > > > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau < > > > > [email protected] > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > In general, let's try to avoid extending the core > structures > > > > like > > > > > >> value > > > > > >> > > > vector read and write expressions for a single operator. > > > > > Zerovector > > > > > >> is > > > > > >> > > > trivial to implement so let's resolve that way (trivial > > since > > > > the > > > > > >> > > > underlying vector already has it and we just need to > > delegate > > > > > down). > > > > > >> > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" < > [email protected] > > > > > > > > wrote: > > > > > >> > > > > > > > > >> > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value > > > > window > > > > > >> > > function > > > > > >> > > > > incorrect result). His code uses > > ValueVectorWriteExpression > > > > to > > > > > >> set > > > > > >> > > > values > > > > > >> > > > > in an internal batch which get re-used across different > > > > > >> partitions of > > > > > >> > > the > > > > > >> > > > > window function. Ideally, we just want to zero out the > > > vector > > > > > >> rather > > > > > >> > > > than > > > > > >> > > > > calling clear() since clear() will release the buffer. > > > > > >> > > > > > > > > > >> > > > > However, currently zeroVectors() is only supported by > > > > > >> > FixedWidthVector, > > > > > >> > > > not > > > > > >> > > > > VariableWidthVector. * Should there be such an > interface > > > for > > > > > >> > variable > > > > > >> > > > > width ? * The implementation could zero out just the > > offset > > > > > >> vector. > > > > > >> > > > > > > > > > >> > > > > In the absence of such an interface, Hakim has added a > > > boolean > > > > > >> flag > > > > > >> > > > > witeNulls to ValueVectorWriteExpression (see > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > > >> > > > > ) > > > > > >> > > > > and is conditionally doing the null-ing out in the > > > generated > > > > > >> code. > > > > > >> > It > > > > > >> > > > > won't affect the normal code path, it would get used for > > > > > specific > > > > > >> > > window > > > > > >> > > > > functions. > > > > > >> > > > > > > > > > >> > > > > I am thinking of committing his patch and tracking the > > > > > >> zeroVectors() > > > > > >> > > > > enhancement separately (if people agree that it would be > > > > > useful). > > > > > >> > Let > > > > > >> > > me > > > > > >> > > > > know... > > > > > >> > > > > > > > > > >> > > > > Aman > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > -- > > > > > >> > > > > > > > >> > > Abdelhakim Deneche > > > > > >> > > > > > > > >> > > Software Engineer > > > > > >> > > > > > > > >> > > <http://www.mapr.com/> > > > > > >> > > > > > > > >> > > > > > > > >> > > Now Available - Free Hadoop On-Demand Training > > > > > >> > > < > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > < > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > Software Engineer > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > < > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > -- Julien
