[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

ASF GitHub Bot (JIRA) Wed, 16 Aug 2017 20:57:22 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129855#comment-16129855
 ]


ASF GitHub Bot commented on DRILL-5657:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/866#discussion_r133618655
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleSetImpl.java
 ---
    @@ -0,0 +1,551 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.rowSet.impl;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.drill.common.types.TypeProtos.DataMode;
    +import org.apache.drill.exec.expr.TypeHelper;
    +import org.apache.drill.exec.physical.rowSet.ColumnLoader;
    +import org.apache.drill.exec.physical.rowSet.TupleLoader;
    +import org.apache.drill.exec.physical.rowSet.TupleSchema;
    +import 
org.apache.drill.exec.physical.rowSet.impl.ResultSetLoaderImpl.VectorContainerBuilder;
    +import org.apache.drill.exec.record.BatchSchema;
    +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
    +import org.apache.drill.exec.record.MaterializedField;
    +import org.apache.drill.exec.vector.AllocationHelper;
    +import org.apache.drill.exec.vector.ValueVector;
    +import org.apache.drill.exec.vector.VectorOverflowException;
    +import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
    +import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
    +
    +/**
    + * Implementation of a column when creating a row batch.
    + * Every column resides at an index, is defined by a schema,
    + * is backed by a value vector, and and is written to by a writer.
    + * Each column also tracks the schema version in which it was added
    + * to detect schema evolution. Each column has an optional overflow
    + * vector that holds overflow record values when a batch becomes
    + * full.
    + * <p>
    + * Overflow vectors require special consideration. The vector class itself
    + * must remain constant as it is bound to the writer. To handle overflow,
    + * the implementation must replace the buffer in the vector with a new
    + * one, saving the full vector to return as part of the final row batch.
    + * This puts the column in one of three states:
    + * <ul>
    + * <li>Normal: only one vector is of concern - the vector for the active
    + * row batch.</li>
    + * <li>Overflow: a write to a vector caused overflow. For all columns,
    + * the data buffer is shifted to a harvested vector, and a new, empty
    + * buffer is put into the active vector.</li>
    + * <li>Excess: a (small) column received values for the row that will
    + * overflow due to a later column. When overflow occurs, the excess
    + * column value, from the overflow record, resides in the active
    + * vector. It must be shifted from the active vector into the new
    + * overflow buffer.
    + */
    +
    +public class TupleSetImpl implements TupleSchema {
    +
    +  public static class TupleLoaderImpl implements TupleLoader {
    +
    +    public TupleSetImpl tupleSet;
    +
    +    public TupleLoaderImpl(TupleSetImpl tupleSet) {
    +      this.tupleSet = tupleSet;
    +    }
    +
    +    @Override
    +    public TupleSchema schema() { return tupleSet; }
    +
    +    @Override
    +    public ColumnLoader column(int colIndex) {
    +      // TODO: Cache loaders here
    +      return tupleSet.columnImpl(colIndex).writer;
    +    }
    +
    +    @Override
    +    public ColumnLoader column(String colName) {
    +      ColumnImpl col = tupleSet.columnImpl(colName);
    +      if (col == null) {
    +        throw new UndefinedColumnException(colName);
    +      }
    +      return col.writer;
    +    }
    +
    +    @Override
    +    public TupleLoader loadRow(Object... values) {
    --- End diff --
    
    Type validation is done as the values wend their way through the writer 
tree. Eventually, we'll notice that the value is, say, Integer and call the 
`setInt()` method on the writer. If the writer is really for, say, a VarChar, 
then an unsupported operation exception will be thrown at that time. Similarly, 
if the object type is not one we know how to parse, then an exception will be 
thrown at that time.


> Implement size-aware result set loader
> --------------------------------------
>
>                 Key: DRILL-5657
>                 URL: https://issues.apache.org/jira/browse/DRILL-5657
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: Future
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

Reply via email to