[GitHub] drill pull request #1121: DRILL-6153: Operator framework

paul-rogers Thu, 15 Feb 2018 21:56:52 -0800

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1121#discussion_r168674650
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/protocol/OperatorDriver.java
 ---
    @@ -0,0 +1,183 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.protocol;
    +
    +import org.apache.drill.common.exceptions.UserException;
    +import org.apache.drill.exec.ops.OperatorContext;
    +import org.apache.drill.exec.record.RecordBatch.IterOutcome;
    +
    +/**
    + * State machine that drives the operator executable. Converts
    + * between the iterator protocol and the operator executable protocol.
    + * Implemented as a separate class in anticipation of eventually
    + * changing the record batch (iterator) protocol.
    + */
    +
    +public class OperatorDriver {
    +  public enum State { START, SCHEMA, RUN, END, FAILED, CLOSED }
    --- End diff --
    
    Thanks for the questions. Let's take them one-by-one.
    
    The model here is that an operator follows the "fast schema" pattern:
    
    * The first call to `next()` produces an empty batch with only the schema.
    * The second call to `next()` returns the first data batch.
    
    The states help:
    
    * `START`: The stage in which the operator has been created, but before the 
first call to `next()`. When `next()` is called in the `START` state, return 
just the schema.
    * `SCHEMA`: The schema only has been returned. On the next call to `next()` 
return the data (if any) associated with the first batch.
    * `RUN`: Normal state for the second and subsequent `next()` calls.
    
    Now, do we need "fast schema"? Maybe not. I *thought* that Drill was 
designed to return the schema quickly to the client before waiting for the 
first data batch. But, in subsequent testing, I discovered that few queries 
actually worked that way. (Some tests count the returned batches and asserted 
that there should have been only 1: with both data and schema...)
    
    So, if we want "fast schema" then we need the three states. But, if we want 
the original behavior, then we can, in fact, remove the `SCHEMA` state.
    
    Was there a reason for the "fast schema" path? Or, was that just a vestige 
of a never-completed feature?

---

[GitHub] drill pull request #1121: DRILL-6153: Operator framework

Reply via email to