Selection vector is the same. Not sure whether either of the others embrace hyperbatch or new for Drill.
J On Aug 6, 2013 7:27 PM, "Timothy Chen" <[email protected]> wrote: > Ah gotcha, it's the same concept in MonetDB and what Hive batch query > engine is using too. Didn't know they call it HyperBatch (unless you > invented it?) > > Tim > > > On Tue, Aug 6, 2013 at 6:53 PM, Jacques Nadeau <[email protected]> wrote: > > > Someone was asking me about the HyperBatch concept that a recent > > commit introduced. The idea is pretty simple. We currently have a > > two byte selection vector that we can use to mask a portion of a > > columnar record batch before we rewrite it. This is to help in > > situations where the rewrite would be unwarranted given the subsequent > > operator. This works great for non-blocking operators. > > > > In the case of blocking operators such as sort, this becomes a bit > > harder. (Especially in the case of schema changes, which I won't > > discuss here.) One solution is generating a this new thing called a > > hyperbatch. It looks kind of like a batch but it carries a > > SelectionVector4 with it. The SV4 describes not only the valid > > records, but also their location within a set of multiple support > > record batches. This is encoded as two unsigned bytes for the record > > batch index followed by two unsigned bytes for the individual record > > (4B records max). In these cases, a (hyper)batch doesn't hold a > > ValueVector for each field but rather an indexed array of > > ValueVectors. This allows a pointer sort to completed without > > rewriting the columnar oriented data until required (typically when > > writing to disk or socket). In the meantime, some additional > > operators can be pipelined with only small modifications. If we get > > to the point that a particular operator no longer supports a SV4 input > > batch, we insert a SelectionVectorRemover to rewrite the data to the > > more standard record batch format. > > > > You can see an example of the interaction at line 68 of this file: > > > > > https://github.com/apache/incubator-drill/blob/db3afaa854fc8475592907dba97162ecf869f9df/sandbox/prototype/exec/java-exec/src/main/java/org/apache/drill/exec/expr/CodeGenerator.java > > > > > > thanks, > > Jacques > > >
