Another question: FixedWidthVector interface defines a zeroVector() method that "Zero out the underlying buffer backing this vector" according to it's javadoc.
Where does this method fit in the value vector states described earlier ? it doesn't clear the vector yet it doesn't reset everything to the after allocate state. On Tue, Aug 25, 2015 at 10:46 AM, Abdel Hakim Deneche <[email protected] > wrote: > One more question about the transition from allocate -> mutate. For Fixed > width vectors and BitVector you can actually call setSafe() without calling > allocateNew() first and it will work. Should it throw an exception instead > ? > not calling allocateNew() has side effects that could cause setSafe() to > throw an OversizedAllocationException if you call setSafe() then clear() > multiple times. > > On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <[email protected]> > wrote: > >> Maybe we should start by putting these rules in a comment in the value >> vector base interfaces? The lack of such information is why there are >> deviations and other expectations. >> >> On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <[email protected]> >> wrote: >> >> > There are a few unspoken "rules" around vectors: >> > >> > - values need to be written in order (e.g. index 0, 1, 2, 5) >> > - null vectors start with all values as null before writing anything >> > - for variable width types, the offset vector should be all zeros before >> > writing >> > - you must call setValueCount before a vector can be read >> > - you should never write to a vector once it has been read. >> > >> > The ultimate goal we should get to the point where you the interfaces >> > guarantee this order of operation: >> > >> > allocate > mutate > setvaluecount > access > clear (or allocate to start >> > the process over, xxx). Any deviation from this pattern should result >> in >> > exception. We should do this only in debug mode as this code is >> extremely >> > performance sensitive. Operations like transfer should be built on top >> of >> > this state model. (In that case, it would mean src moves to clear state >> > and target moves to access state. It also means that transfer should >> only >> > work in access state.) >> > >> > If we need special purpose data structures that don't operate in these >> > ways, we should make sure to keep them separate rather than trying to >> > accommodate a deviation from this pattern in the core vector code. >> > >> > I wrote xxx above because I see the purpose of zeroVectors as being a >> reset >> > on the vector state back to the original state. Maybe we should >> actually >> > call it 'reset' rather than 'zeroVectors'. This would basically pick >> up at >> > mutate mode again. >> > >> > Since these rules were never formalized, I'm sure there are a few places >> > where we currently deviate. We should enforce these rules and then get >> > those issues fixed. >> > >> > >> > >> > -- >> > Jacques Nadeau >> > CTO and Co-Founder, Dremio >> > >> > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < >> > [email protected]> >> > wrote: >> > >> > > Another important point to keep in mind here: >> ValueVectorWriteExpression >> > > operates under the "undocumented" assumption that the destination >> vector >> > is >> > > empty, this way it can safely skip writing null values. In the case of >> > > window functions I am using a value vector as an internal buffer to >> hold >> > > values between batches which voids the assumption. >> > > If this assumption is indeed correct then adding zeroVector to value >> > > vectors is indeed the way to go. >> > > >> > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <[email protected]> >> > > wrote: >> > > >> > > > In general, let's try to avoid extending the core structures like >> value >> > > > vector read and write expressions for a single operator. Zerovector >> is >> > > > trivial to implement so let's resolve that way (trivial since the >> > > > underlying vector already has it and we just need to delegate down). >> > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected]> wrote: >> > > > >> > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window >> > > function >> > > > > incorrect result). His code uses ValueVectorWriteExpression to >> set >> > > > values >> > > > > in an internal batch which get re-used across different >> partitions of >> > > the >> > > > > window function. Ideally, we just want to zero out the vector >> rather >> > > > than >> > > > > calling clear() since clear() will release the buffer. >> > > > > >> > > > > However, currently zeroVectors() is only supported by >> > FixedWidthVector, >> > > > not >> > > > > VariableWidthVector. * Should there be such an interface for >> > variable >> > > > > width ? * The implementation could zero out just the offset >> vector. >> > > > > >> > > > > In the absence of such an interface, Hakim has added a boolean >> flag >> > > > > witeNulls to ValueVectorWriteExpression (see >> > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d >> > > > > ) >> > > > > and is conditionally doing the null-ing out in the generated >> code. >> > It >> > > > > won't affect the normal code path, it would get used for specific >> > > window >> > > > > functions. >> > > > > >> > > > > I am thinking of committing his patch and tracking the >> zeroVectors() >> > > > > enhancement separately (if people agree that it would be useful). >> > Let >> > > me >> > > > > know... >> > > > > >> > > > > Aman >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > >> > > Abdelhakim Deneche >> > > >> > > Software Engineer >> > > >> > > <http://www.mapr.com/> >> > > >> > > >> > > Now Available - Free Hadoop On-Demand Training >> > > < >> > > >> > >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> > > > >> > > >> > >> > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available> > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
