[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

GitBox Sat, 18 Apr 2020 08:27:22 -0700

arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410711733


 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/ReaderLifecycle.java
 ##########
 @@ -0,0 +1,383 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.lifecycle;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.RowBatchReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder;
+import org.apache.drill.exec.physical.impl.scan.v3.SchemaNegotiator;
+import 
org.apache.drill.exec.physical.impl.scan.v3.ManagedReader.EarlyEofException;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.BatchSource;
+import org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl;
+import org.apache.drill.exec.physical.resultSet.impl.ResultSetOptionBuilder;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Manages the schema and batch construction for a managed reader.
+ * Allows the reader itself to be as simple as possible. This class
+ * implements the basic {@link RowBatchReader} protocol based on
+ * three methods, and converts it to the two-method protocol of
+ * the managed reader. The {@code open()} call of the
+ * {@code RowBatchReader} is combined with the constructor of the
+ * {@link ManagedReader}, enforcing the rule that the managed reader
+ * is created just-in-time when it is to be used, which avoids
+ * accidentally holding resources for the life of the scan.
+ * <p>
+ * Coordinates the components that wrap a reader to create the final
+ * output batch:
+ * <ul>
+ * <li>The actual reader which load (possibly a subset of) the
+ * columns requested from the input source.</li>
+ * <li>Implicit columns manager instance which populates implicit
+ * file columns, partition columns, and Drill's internal metadata
+ * columns.</li>
+ * <li>The missing columns handler which "makes up" values for projected
+ * columns not read by the reader.</li>
+ * <li>Batch assembler, which combines the three sources of vectors
+ * to create the output batch with the schema specified by the
+ * schema tracker.</li>
+ * </ul>
+ * <p>
+ * This class coordinates the reader-visible aspects of the scan:
+ * <ul>
+ * <li>The {@link SchemaNegotiator} (or subclass) which provides
+ * schema-related input to the reader and which creates the reader's
+ * {@link ResultSetLoader}, among other tasks. The schema negotiator
+ * is specific to each kind of scan and is thus created via the
+ * {@link ScanLifecycleBuilder}.</li>
+ * <li>The reader, which is designed to be as simple as possible,
+ * with all generic overhead tasks handled by this "shim" between
+ * the scan operator and the actual reader implementation.</li>
+ * </ul>
+ * <p>
+ * The reader is schema-driven. See {@link ScanSchemaTracker} for
+ * an overview.
+ * <ul>
+ * <li>The reader is given a <i>reader input schema</i>, via the
+ * schema negotiator, which specifies the desired output schema.
+ * The schema can be fully dynamic (a wildcard), fully defined (a
+ * prior reader already chose column types), or a hybrid.</li>
+ * <li>The reader can load a subset of columns. Those that are
+ * left out become "missing columns" to be filled in by this
+ * class.</li>
+ * <li>The <i>reader output schema</i> along with implicit and missing
+ * columns, together define the scan's output schema.</li>
+ * </ul>
+ * <p>
+ * The framework handles the projection task so the
+ * reader does not have to worry about it. Reading an unwanted column
+ * is low cost: the result set loader will have provided a "dummy" column
+ * writer that simply discards the value. This is just as fast as having the
+ * reader use if-statements or a table to determine which columns to save.
+ */
+public class ReaderLifecycle implements RowBatchReader {
+  static final Logger logger = LoggerFactory.getLogger(ReaderLifecycle.class);
 
 Review comment:
   private

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

Reply via email to