ASF GitHub Bot commented on DRILL-5657:

Github user bitblender commented on a diff in the pull request:

    --- Diff: 
    @@ -0,0 +1,551 @@
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.rowSet.impl;
    +import java.util.ArrayList;
    +import java.util.List;
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.exec.expr.TypeHelper;
    +import org.apache.drill.exec.physical.rowSet.ColumnLoader;
    +import org.apache.drill.exec.physical.rowSet.TupleLoader;
    +import org.apache.drill.exec.physical.rowSet.TupleSchema;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VectorOverflowException;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    + * Implementation of a column when creating a row batch.
    + * Every column resides at an index, is defined by a schema,
    + * is backed by a value vector, and and is written to by a writer.
    + * Each column also tracks the schema version in which it was added
    + * to detect schema evolution. Each column has an optional overflow
    + * vector that holds overflow record values when a batch becomes
    + * full.
    + * <p>
    + * Overflow vectors require special consideration. The vector class itself
    + * must remain constant as it is bound to the writer. To handle overflow,
    + * the implementation must replace the buffer in the vector with a new
    + * one, saving the full vector to return as part of the final row batch.
    + * This puts the column in one of three states:
    + * <ul>
    + * <li>Normal: only one vector is of concern - the vector for the active
    + * row batch.</li>
    + * <li>Overflow: a write to a vector caused overflow. For all columns,
    + * the data buffer is shifted to a harvested vector, and a new, empty
    + * buffer is put into the active vector.</li>
    + * <li>Excess: a (small) column received values for the row that will
    --- End diff --
    'Excess' is the LOOK_AHEAD state, correct? I think it would be better if 
the comments use the same terminology as in the code.

> Implement size-aware result set loader
> --------------------------------------
>                 Key: DRILL-5657
>                 URL: https://issues.apache.org/jira/browse/DRILL-5657
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: Future
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: Future
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.

This message was sent by Atlassian JIRA

Reply via email to