One more question about the transition from allocate -> mutate. For Fixed width vectors and BitVector you can actually call setSafe() without calling allocateNew() first and it will work. Should it throw an exception instead ? not calling allocateNew() has side effects that could cause setSafe() to throw an OversizedAllocationException if you call setSafe() then clear() multiple times.
On Tue, Aug 25, 2015 at 10:01 AM, Chris Westin <[email protected]> wrote: > Maybe we should start by putting these rules in a comment in the value > vector base interfaces? The lack of such information is why there are > deviations and other expectations. > > On Tue, Aug 25, 2015 at 8:22 AM, Jacques Nadeau <[email protected]> > wrote: > > > There are a few unspoken "rules" around vectors: > > > > - values need to be written in order (e.g. index 0, 1, 2, 5) > > - null vectors start with all values as null before writing anything > > - for variable width types, the offset vector should be all zeros before > > writing > > - you must call setValueCount before a vector can be read > > - you should never write to a vector once it has been read. > > > > The ultimate goal we should get to the point where you the interfaces > > guarantee this order of operation: > > > > allocate > mutate > setvaluecount > access > clear (or allocate to start > > the process over, xxx). Any deviation from this pattern should result in > > exception. We should do this only in debug mode as this code is > extremely > > performance sensitive. Operations like transfer should be built on top > of > > this state model. (In that case, it would mean src moves to clear state > > and target moves to access state. It also means that transfer should > only > > work in access state.) > > > > If we need special purpose data structures that don't operate in these > > ways, we should make sure to keep them separate rather than trying to > > accommodate a deviation from this pattern in the core vector code. > > > > I wrote xxx above because I see the purpose of zeroVectors as being a > reset > > on the vector state back to the original state. Maybe we should actually > > call it 'reset' rather than 'zeroVectors'. This would basically pick up > at > > mutate mode again. > > > > Since these rules were never formalized, I'm sure there are a few places > > where we currently deviate. We should enforce these rules and then get > > those issues fixed. > > > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Tue, Aug 25, 2015 at 8:02 AM, Abdel Hakim Deneche < > > [email protected]> > > wrote: > > > > > Another important point to keep in mind here: > ValueVectorWriteExpression > > > operates under the "undocumented" assumption that the destination > vector > > is > > > empty, this way it can safely skip writing null values. In the case of > > > window functions I am using a value vector as an internal buffer to > hold > > > values between batches which voids the assumption. > > > If this assumption is indeed correct then adding zeroVector to value > > > vectors is indeed the way to go. > > > > > > On Mon, Aug 24, 2015 at 8:30 PM, Jacques Nadeau <[email protected]> > > > wrote: > > > > > > > In general, let's try to avoid extending the core structures like > value > > > > vector read and write expressions for a single operator. Zerovector > is > > > > trivial to implement so let's resolve that way (trivial since the > > > > underlying vector already has it and we just need to delegate down). > > > > On Aug 24, 2015 3:36 PM, "Aman Sinha" <[email protected]> wrote: > > > > > > > > > I am reviewing Hakim's patch for DRILL-3668 (first_value window > > > function > > > > > incorrect result). His code uses ValueVectorWriteExpression to set > > > > values > > > > > in an internal batch which get re-used across different partitions > of > > > the > > > > > window function. Ideally, we just want to zero out the vector > rather > > > > than > > > > > calling clear() since clear() will release the buffer. > > > > > > > > > > However, currently zeroVectors() is only supported by > > FixedWidthVector, > > > > not > > > > > VariableWidthVector. * Should there be such an interface for > > variable > > > > > width ? * The implementation could zero out just the offset vector. > > > > > > > > > > In the absence of such an interface, Hakim has added a boolean flag > > > > > witeNulls to ValueVectorWriteExpression (see > > > > > > > > > > > > > > > > > > > > https://github.com/adeneche/incubator-drill/commit/cab73cd1a50163dd25fe0f9c55c264780ea3616d > > > > > ) > > > > > and is conditionally doing the null-ing out in the generated code. > > It > > > > > won't affect the normal code path, it would get used for specific > > > window > > > > > functions. > > > > > > > > > > I am thinking of committing his patch and tracking the > zeroVectors() > > > > > enhancement separately (if people agree that it would be useful). > > Let > > > me > > > > > know... > > > > > > > > > > Aman > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
