[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

paul-rogers Tue, 14 Nov 2017 09:56:29 -0800

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/914#discussion_r150758630
  
    --- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/accessor/writer/BaseScalarWriter.java
 ---
    @@ -0,0 +1,264 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.vector.accessor.writer;
    +
    +import java.math.BigDecimal;
    +
    +import org.apache.drill.exec.vector.accessor.ColumnWriterIndex;
    +import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter;
    +import org.joda.time.Period;
    +
    +/**
    + * Column writer implementation that acts as the basis for the
    + * generated, vector-specific implementations. All set methods
    + * throw an exception; subclasses simply override the supported
    + * method(s).
    + * <p>
    + * The only tricky part to this class is understanding the
    + * state of the write indexes as the write proceeds. There are
    + * two pointers to consider:
    + * <ul>
    + * <li>lastWriteIndex: The position in the vector at which the
    + * client last asked us to write data. This index is maintained
    + * in this class because it depends only on the actions of this
    + * class.</li>
    + * <li>vectorIndex: The position in the vector at which we will
    + * write if the client chooses to write a value at this time.
    + * The vector index is shared by all columns at the same repeat
    + * level. It is incremented as the client steps through the write
    + * and is observed in this class each time a write occurs.</i>
    + * </ul>
    + * A repeat level is defined as any of the following:
    + * <ul>
    + * <li>The set of top-level scalar columns, or those within a
    + * top-level, non-repeated map, or nested to any depth within
    + * non-repeated maps rooted at the top level.</li>
    + * <li>The values for a single scalar array.</li>
    + * <li>The set of scalar columns within a repeated map, or
    + * nested within non-repeated maps within a repeated map.</li>
    + * </ul>
    + * Items at a repeat level index together and share a vector
    + * index. However, the columns within a repeat level
    + * <i>do not</i> share a last write index: some can lag further
    + * behind than others.
    + * <p>
    + * Let's illustrate the states. Let's focus on one column and
    + * illustrate the three states that can occur during write:
    + * <ul>
    + * <li><b>Behind</b>: the last write index is more than one position behind
    + * the vector index. Zero-filling will be needed to catch up to
    + * the vector index.</li>
    + * <li><b>Written</b>: the last write index is the same as the vector
    + * index because the client wrote data at this position (and previous
    + * values were back-filled with nulls, empties or zeros.)</li>
    + * <li><b>Unwritten</b>: the last write index is one behind the vector
    + * index. This occurs when the column was written, then the client
    + * moved to the next row or array position.</li>
    + * <li><b>Restarted</b>: The current row is abandoned (perhaps filtered
    + * out) and is to be rewritten. The last write position moves
    + * back one position. Note that, the Restarted state is
    + * indistinguishable from the unwritten state: the only real
    + * difference is that the current slot (pointed to by the
    + * vector index) contains the previous written value that must
    + * be overwritten or back-filled. But, this is fine, because we
    + * assume that unwritten values are garbage anyway.</li>
    + * </ul>
    + * To illustrate:<pre><code>
    + *      Behind      Written    Unwritten    Restarted
    + *       |X|          |X|         |X|          |X|
    + *   lw >|X|          |X|         |X|          |X|
    + *       | |          |0|         |0|     lw > |0|
    + *    v >| |  lw, v > |X|    lw > |X|      v > |X|
    + *                            v > | |
    + * </code></pre>
    + * The illustrated state transitions are:
    + * <ul>
    + * <li>Suppose the state starts in Behind.<ul>
    + *   <li>If the client writes a value, then the empty slot is
    + *       back-filled and the state moves to Written.</li>
    + *   <li>If the client does not write a value, the state stays
    + *       at Behind, and the gap of unfilled values grows.</li></ul></li>
    + * <li>When in the Written state:<ul>
    + *   <li>If the client saves the current row or array position,
    + *       the vector index increments and we move to the Unwritten
    + *       state.</li>
    + *   <li>If the client abandons the row, the last write position
    + *       moves back one to recreate the unwritten state. We've
    + *       shown this state separately above just to illustrate
    + *       the two transitions from Written.</li></ul></li>
    + * <li>When in the Unwritten (or Restarted) states:<ul>
    + *   <li>If the client writes a value, then the writer moves back to the
    + *       Written state.</li>
    + *   <li>If the client skips the value, then the vector index increments
    + *       again, leaving a gap, and the writer moves to the
    + *       Behind state.</li></ul>
    + * </ul>
    + * <p>
    + * We've already noted that the Restarted state is identical to
    + * the Unwritten state (and was discussed just to make the flow a bit
    + * clearer.) The astute reader will have noticed that the Behind state is
    + * the same as the Unwritten state if we define the combined state as
    + * when the last write position is behind the vector index.
    + * <p>
    + * Further, if
    + * one simply treats the gap between last write and the vector indexes
    + * as the amount (which may be zero) to back-fill, then there is just
    + * one state. This is, in fact, how the code works: it always writes
    + * to the vector index (and can do so multiple times for a single row),
    + * back-filling as necessary.
    + * <p>
    + * The states, then, are more for our use in understanding the algorithm.
    + * They are also very useful when working through the logic of performing
    + * a roll-over when a vector overflows.
    + */
    +
    +public abstract class BaseScalarWriter extends AbstractScalarWriter {
    +
    +  public static final int MIN_BUFFER_SIZE = 256;
    +
    +  /**
    +   * Indicates the position in the vector to write. Set via an object so 
that
    +   * all writers (within the same subtree) can agree on the write position.
    +   * For example, all top-level, simple columns see the same row index.
    +   * All columns within a repeated map see the same (inner) index, etc.
    +   */
    +
    +  protected ColumnWriterIndex vectorIndex;
    +
    +  /**
    +   * Listener invoked if the vector overflows. If not provided, then the 
writer
    +   * does not support vector overflow.
    +   */
    +
    +  protected ColumnWriterListener listener;
    +
    +  /**
    +   * Cached direct memory location of the start of data for the vector
    +   * being written. Updated each time the buffer is reallocated.
    +   */
    +
    +  protected long bufAddr;
    --- End diff --
    
    Very good question that requires a longer answer than can be explained 
here. Basically, the thought is that these accessors are the primary interface 
between users of vectors and the backing memory buffers. `DrillBuf`, like the 
`ByteBuf` from which it derives, and the `ByteBuffer` on which it is modeled, 
assume a serialization model. Here we assume more of a DB buffer model.
    
    The model used in the code is that `DrillBuf` handles allocation, reference 
counting, freeing and so on. The column accessors handle writes to, and reads 
from, the buffer using `PlatformDependent`.
    
    Calling `DrillBuf` methods without bounds checks is really little different 
than using `PlatformDependent` directly. Avoiding those extra calls has a 
performance benefit.
    
    FWIW, the text reader has long used memory addresses; here that work is 
isolated here, and removed (in a later PR) from the text reader (and other 
places.)

---

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

Reply via email to