[
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119209#comment-16119209
]
ASF GitHub Bot commented on DRILL-5657:
---------------------------------------
Github user bitblender commented on a diff in the pull request:
https://github.com/apache/drill/pull/866#discussion_r131685349
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/TupleSetImpl.java
---
@@ -0,0 +1,551 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.rowSet.impl;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.physical.rowSet.ColumnLoader;
+import org.apache.drill.exec.physical.rowSet.TupleLoader;
+import org.apache.drill.exec.physical.rowSet.TupleSchema;
+import
org.apache.drill.exec.physical.rowSet.impl.ResultSetLoaderImpl.VectorContainerBuilder;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorOverflowException;
+import org.apache.drill.exec.vector.accessor.impl.AbstractColumnWriter;
+import org.apache.drill.exec.vector.accessor.impl.ColumnAccessorFactory;
+
+/**
+ * Implementation of a column when creating a row batch.
+ * Every column resides at an index, is defined by a schema,
+ * is backed by a value vector, and and is written to by a writer.
+ * Each column also tracks the schema version in which it was added
+ * to detect schema evolution. Each column has an optional overflow
+ * vector that holds overflow record values when a batch becomes
+ * full.
+ * <p>
+ * Overflow vectors require special consideration. The vector class itself
+ * must remain constant as it is bound to the writer. To handle overflow,
+ * the implementation must replace the buffer in the vector with a new
+ * one, saving the full vector to return as part of the final row batch.
+ * This puts the column in one of three states:
+ * <ul>
+ * <li>Normal: only one vector is of concern - the vector for the active
+ * row batch.</li>
+ * <li>Overflow: a write to a vector caused overflow. For all columns,
+ * the data buffer is shifted to a harvested vector, and a new, empty
+ * buffer is put into the active vector.</li>
+ * <li>Excess: a (small) column received values for the row that will
--- End diff --
'Excess' is the LOOK_AHEAD state, correct? I think it would be better if
the comments use the same terminology as in the code.
> Implement size-aware result set loader
> --------------------------------------
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: Future
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set"
> abstraction to allow us to create, and verify, record batches with very few
> lines of code. Part of this work involved creating a set of "column
> accessors" in the vector subsystem. Column readers provide a uniform API to
> obtain data from columns (vectors), while column writers provide a uniform
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size
> (to avoid memory fragmentation due to Drill's two memory allocators.) The
> column accessors have proven to be so useful that they will be the basis for
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware
> vector writing, including the case in which a vector fills in the middle of a
> row.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)