I think the zero vectors approach is fine. I don't think we should merge the value vector read/write expression changes. On Aug 27, 2015 9:03 PM, "Aman Sinha" <[email protected]> wrote:
> Based on the discussion it looks like doing what the first_value/last_value > window function needs from the value vectors without violating the > recommended state transition requires a bit more thought such that we don't > introduce regression. Since testing is blocked on these for 1.2, can Hakim > proceed with his current fix ? We could create a JIRA to revisit it post > 1.2... > > Aman > > On Wed, Aug 26, 2015 at 3:07 PM, Julien Le Dem <[email protected]> wrote: > > > I can take a look at the Vectors and add asserts to enforce the contract > is > > respected. > > > > On Wed, Aug 26, 2015 at 2:52 PM, Steven Phillips <[email protected]> > > wrote: > > > > > One possible exception to the access pattern occurs when vectors wrap > > other > > > vectors. Specifically, the offset vectors in Variable Length and > Repeated > > > vectors. These vectors are accessed and mutated multiple times. If we > are > > > going to implement strict enforcement, we need to consider that case. > > > > > > On Tue, Aug 25, 2015 at 7:15 PM, Jacques Nadeau <[email protected]> > > > wrote: > > > > > > > Yes, by recommendation is to correct the usage in StreamingAggBatch > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Tue, Aug 25, 2015 at 4:52 PM, Abdel Hakim Deneche < > > > > [email protected]> > > > > wrote: > > > > > > > > > I think zeroVector() is mainly used to fill the vector with zeros, > > > which > > > > is > > > > > fine if you call it while the vector is in "mutate" state, but > > > > > StreamingAggBatch does actually call it after setting the value > count > > > of > > > > > the value vector which is against the paradigm. > > > > > > > > > > > > > > > On Tue, Aug 25, 2015 at 3:51 PM, Jacques Nadeau < > [email protected]> > > > > > wrote: > > > > > > > > > > > In all but one situations, this is an internal concern (making > sure > > > to > > > > > zero > > > > > > out the memory). For fixed width vectors, there is an assumption > > > that > > > > an > > > > > > initial allocation is clean memory (e.g. all zeros in the faces > of > > an > > > > int > > > > > > vector). So this should be pulled off a public vector interface. > > > The > > > > > one > > > > > > place where it is being used today is StreamingAggBatch and I > think > > > we > > > > > > should fix that to follow the state paradigm described above. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Jacques Nadeau > > > > > > CTO and Co-Founder, Dremio > > > > > > > > > > > > On Tue, Aug 25, 2015 at 3:41 PM, Abdel Hakim Deneche < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Another question: FixedWidthVector interface defines a > > zeroVector() > > > > > > method > > > > > > > that > > > > > > > "Zero out the underlying buffer backing this vector" according > to > > > > it's > > > > > > > javadoc. > > > > > > > > > > > > > > Where does this method fit in the value vector states described > > > > > earlier ? > > > > > > > it doesn't clear the vector yet it doesn't reset everything to > > the > > > > > after > > > > > > > allocate state. > > > > > > > > > > > > > > On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche < > > > > > > > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > One more question about the transition from allocate -> > mutate. > > > For > > > > > > Fixed > > > > > > > > width vectors and BitVector you can actually call setSafe() > > > without > > > > > > > calling > > > > > > > > allocateNew() first and it will work. Should it throw an > > > exception > > > > > > > instead > > > > > > > > ? > > > > > > > > not calling allocateNew() has side effects that could cause > > > > setSafe() > > > > > > to > > > > > > > > throw an OversizedAllocationException if you call setSafe() > > then > > > > > > clear() > > > > > > > > multiple times. > > > > > > > > > > > > > > > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin < > > > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Maybe we should start by putting these rules in a comment in > > the > > > > > value > > > > > > > >> vector base interfaces? The lack of such information is why > > > there > > > > > are > > > > > > > >> deviations and other expectations. > > > > > > > >> > > > > > > > >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau < > > > > [email protected] > > > > > > > > > > > > > >> wrote: > > > > > > > >> > > > > > > > >> > There are a few unspoken "rules" around vectors: > > > > > > > >> > > > > > > > > >> > - values need to be written in order (e.g. index 0, 1, 2, > 5) > > > > > > > >> > - null vectors start with all values as null before > writing > > > > > anything > > > > > > > >> > - for variable width types, the offset vector should be > all > > > > zeros > > > > > > > before > > > > > > > >> > writing > > > > > > > >> > - you must call setValueCount before a vector can be read > > > > > > > >> > - you should never write to a vector once it has been > read. > > > > > > > >> > > > > > > > > >> > The ultimate goal we should get to the point where you the > > > > > > interfaces > > > > > > > >> > guarantee this order of operation: > > > > > > > >> > > > > > > > > >> > allocate > mutate > setvaluecount > access > clear (or > > > allocate > > > > to > > > > > > > start > > > > > > > >> > the process over, xxx). Any deviation from this pattern > > > should > > > > > > result > > > > > > > >> in > > > > > > > >> > exception. We should do this only in debug mode as this > > code > > > is > > > > > > > >> extremely > > > > > > > >> > performance sensitive. Operations like transfer should be > > > built > > > > > on > > > > > > > top > > > > > > > >> of > > > > > > > >> > this state model. (In that case, it would mean src moves > to > > > > clear > > > > > > > state > > > > > > > >> > and target moves to access state. It also means that > > transfer > > > > > > should > > > > > > > >> only > > > > > > > >> > work in access state.) > > > > > > > >> > > > > > > > > >> > If we need special purpose data structures that don't > > operate > > > in > > > > > > these > > > > > > > >> > ways, we should make sure to keep them separate rather > than > > > > trying > > > > > > to > > > > > > > >> > accommodate a deviation from this pattern in the core > vector > > > > code. > > > > > > > >> > > > > > > > > >> > I wrote xxx above because I see the purpose of zeroVectors > > as > > > > > being > > > > > > a > > > > > > > >> reset > > > > > > > >> > on the vector state back to the original state. Maybe we > > > should > > > > > > > >> actually > > > > > > > >> > call it 'reset' rather than 'zeroVectors'. This would > > > basically > > > > > > pick > > > > > > > >> up at > > > > > > > >> > mutate mode again. > > > > > > > >> > > > > > > > > >> > Since these rules were never formalized, I'm sure there > are > > a > > > > few > > > > > > > places > > > > > > > >> > where we currently deviate. We should enforce these rules > > and > > > > > then > > > > > > > get > > > > > > > >> > those issues fixed. > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > -- > > > > > > > >> > Jacques Nadeau > > > > > > > >> > CTO and Co-Founder, Dremio > > > > > > > >> > > > > > > > > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > > > > > > >> > [email protected]> > > > > > > > >> > wrote: > > > > > > > >> > > > > > > > > >> > > Another important point to keep in mind here: > > > > > > > >> ValueVectorWriteExpression > > > > > > > >> > > operates under the "undocumented" assumption that the > > > > > destination > > > > > > > >> vector > > > > > > > >> > is > > > > > > > >> > > empty, this way it can safely skip writing null values. > In > > > the > > > > > > case > > > > > > > of > > > > > > > >> > > window functions I am using a value vector as an > internal > > > > buffer > > > > > > to > > > > > > > >> hold > > > > > > > >> > > values between batches which voids the assumption. > > > > > > > >> > > If this assumption is indeed correct then adding > > zeroVector > > > to > > > > > > value > > > > > > > >> > > vectors is indeed the way to go. > > > > > > > >> > > > > > > > > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau < > > > > > > [email protected] > > > > > > > > > > > > > > > >> > > wrote: > > > > > > > >> > > > > > > > > > >> > > > In general, let's try to avoid extending the core > > > structures > > > > > > like > > > > > > > >> value > > > > > > > >> > > > vector read and write expressions for a single > operator. > > > > > > > Zerovector > > > > > > > >> is > > > > > > > >> > > > trivial to implement so let's resolve that way > (trivial > > > > since > > > > > > the > > > > > > > >> > > > underlying vector already has it and we just need to > > > > delegate > > > > > > > down). > > > > > > > >> > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > >> > > > > > > > > > > >> > > > > I am reviewing Hakim's patch for DRILL-3668 > > (first_value > > > > > > window > > > > > > > >> > > function > > > > > > > >> > > > > incorrect result). His code uses > > > > ValueVectorWriteExpression > > > > > > to > > > > > > > >> set > > > > > > > >> > > > values > > > > > > > >> > > > > in an internal batch which get re-used across > > different > > > > > > > >> partitions of > > > > > > > >> > > the > > > > > > > >> > > > > window function. Ideally, we just want to zero out > > the > > > > > vector > > > > > > > >> rather > > > > > > > >> > > > than > > > > > > > >> > > > > calling clear() since clear() will release the > buffer. > > > > > > > >> > > > > > > > > > > > >> > > > > However, currently zeroVectors() is only supported > by > > > > > > > >> > FixedWidthVector, > > > > > > > >> > > > not > > > > > > > >> > > > > VariableWidthVector. * Should there be such an > > > interface > > > > > for > > > > > > > >> > variable > > > > > > > >> > > > > width ? * The implementation could zero out just the > > > > offset > > > > > > > >> vector. > > > > > > > >> > > > > > > > > > > > >> > > > > In the absence of such an interface, Hakim has > added a > > > > > boolean > > > > > > > >> flag > > > > > > > >> > > > > witeNulls to ValueVectorWriteExpression (see > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > > > > >> > > > > ) > > > > > > > >> > > > > and is conditionally doing the null-ing out in the > > > > > generated > > > > > > > >> code. > > > > > > > >> > It > > > > > > > >> > > > > won't affect the normal code path, it would get used > > for > > > > > > > specific > > > > > > > >> > > window > > > > > > > >> > > > > functions. > > > > > > > >> > > > > > > > > > > > >> > > > > I am thinking of committing his patch and tracking > the > > > > > > > >> zeroVectors() > > > > > > > >> > > > > enhancement separately (if people agree that it > would > > be > > > > > > > useful). > > > > > > > >> > Let > > > > > > > >> > > me > > > > > > > >> > > > > know... > > > > > > > >> > > > > > > > > > > > >> > > > > Aman > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > -- > > > > > > > >> > > > > > > > > > >> > > Abdelhakim Deneche > > > > > > > >> > > > > > > > > > >> > > Software Engineer > > > > > > > >> > > > > > > > > > >> > > <http://www.mapr.com/> > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > Now Available - Free Hadoop On-Demand Training > > > > > > > >> > > < > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > Software Engineer > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > < > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Julien > > >
