[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

paul-rogers Sat, 16 Sep 2017 15:25:43 -0700

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/914#discussion_r139296614
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
    @@ -0,0 +1,295 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +/**
    + * Handles the details of the result set loader implementation.
    + * <p>
    + * The primary purpose of this loader, and the most complex to understand 
and
    + * maintain, is overflow handling.
    + *
    + * <h4>Detailed Use Cases</h4>
    + *
    + * Let's examine it by considering a number of
    + * use cases.
    + * <table style="border: 1px solid; border-collapse: collapse;">
    + * 
<tr><th>Row</th><th>a</th><th>b</th><th>c</th><th>d</th><th>e</th><th>f</th><th>g</th><th>h</th></tr>
    + * 
<tr><td>n-2</td><td>X</td><td>X</td><td>X</td><td>X</td><td>X</td><td>X</td><td>-</td><td>-</td></tr>
    + * <tr><td>n-1</td><td>X</td><td>X</td><td>X</td><td>X</td><td> </td><td> 
</td><td>-</td><td>-</td></tr>
    + * <tr><td>n  </td><td>X</td><td>!</td><td>O</td><td> </td><td>O</td><td> 
</td><td>O</td><td> </td></tr>
    + * </table>
    + * Here:
    + * <ul>
    + * <li>n-2, n-1, and n are rows. n is the overflow row.</li>
    + * <li>X indicates a value was written before overflow.</li>
    + * <li>Blank indicates no value was written in that row.</li>
    + * <li>! indicates the value that triggered overflow.</li>
    + * <li>- indicates a column that did not exist prior to overflow.</li>
    + * </ul>
    + * Column a is written before overflow occurs, b causes overflow, and all 
other
    + * columns either are not written, or written after overflow.
    + * <p>
    + * The scenarios, identified by column names above, are:
    + * <dl>
    + * <dt>a</dt>
    + * <dd>a contains values for all three rows.
    + * <ul>
    + * <li>Two values were written in the "main" batch, while a third was 
written to
    + * what becomes the overflow row.</li>
    + * <li>When overflow occurs, the last write position is at n. It must be 
moved
    + * back to n-1.</li>
    + * <li>Since data was written to the overflow row, it is copied to the 
look-
    + * ahead batch.</li>
    + * <li>The last write position in the lookahead batch is 0 (since data was
    + * copied into the 0th row.</li>
    + * <li>When harvesting, no empty-filling is needed.</li>
    + * <li>When starting the next batch, the last write position must be set 
to 0 to
    + * reflect the presence of the value for row n.</li>
    + * </ul>
    + * </dd>
    + * <dt>b</dt>
    + * <dd>b contains values for all three rows. The value for row n triggers
    + * overflow.
    + * <ul>
    + * <li>The last write position is at n-1, which is kept for the "main"
    + * vector.</li>
    + * <li>A new overflow vector is created and starts empty, with the last 
write
    + * position at -1.</li>
    + * <li>Once created, b is immediately written to the overflow vector, 
advancing
    + * the last write position to 0.</li>
    + * <li>Harvesting, and starting the next for column b works the same as 
column
    + * a.</li>
    + * </ul>
    + * </dd>
    + * <dt>c</dt>
    + * <dd>Column c has values for all rows.
    + * <ul>
    + * <li>The value for row n is written after overflow.</li>
    + * <li>At overflow, the last write position is at n-1.</li>
    + * <li>At overflow, a new lookahead vector is created with the last write
    + * position at -1.</li>
    + * <li>The value of c is written to the lookahead vector, advancing the 
last
    + * write position to -1.</li>
    + * <li>Harvesting, and starting the next for column c works the same as 
column
    + * a.</li>
    + * </ul>
    + * </dd>
    + * <dt>d</</dt>
    + * <dd>Column d writes values to the last two rows before overflow, but 
not to
    + * the overflow row.
    + * <ul>
    + * <li>The last write position for the main batch is at n-1.</li>
    + * <li>The last write position in the lookahead batch remains at -1.</li>
    + * <li>Harvesting for column d requires filling an empty value for row 
n-1.</li>
    + * <li>When starting the next batch, the last write position must be set 
to -1,
    + * indicating no data yet written.</li>
    + * </ul>
    + * </dd>
    + * <dt>f</dt>
    + * <dd>Column f has no data in the last position of the main batch, and no 
data
    + * in the overflow row.
    + * <ul>
    + * <li>The last write position is at n-2.</li>
    + * <li>An empty value must be written into position n-1 during 
harvest.</li>
    + * <li>On start of the next batch, the last write position starts at 
-1.</li>
    + * </ul>
    + * </dd>
    + * <dt>g</dt>
    + * <dd>Column g is added after overflow, and has a value written to the 
overflow
    + * row.
    + * <ul>
    + * <li>On harvest, column g is simply skipped.</li>
    + * <li>On start of the next row, the last write position can be left 
unchanged
    + * since no "exchange" was done.</li>
    + * </ul>
    + * </dd>
    + * <dt>h</dt>
    + * <dd>Column h is added after overflow, but does not have data written to 
it
    + * during the overflow row. Similar to column g, but the last write 
position
    + * starts at -1 for the next batch.</dd>
    + * </dl>
    + *
    + * <h4>General Rules</h4>
    + *
    + * The above can be summarized into a smaller set of rules:
    + * <p>
    + * At the time of overflow on row n:
    + * <ul>
    + * <li>Create or clear the lookahead vector.</li>
    + * <li>Copy (last write position - n) values from row n in the old vector 
to 0
    --- End diff --
    
    Fixed.

---

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

Reply via email to