[ 
https://issues.apache.org/jira/browse/DRILL-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285072#comment-16285072
 ] 

ASF GitHub Bot commented on DRILL-5963:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1051#discussion_r155938213
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ---
    @@ -502,33 +438,82 @@ private void runFragment(List<PlanFragment> 
fragmentsList) throws ExecutionSetup
           }
         }
     
    +    assert rootFragment != null;
    +
         final FragmentRoot rootOperator;
         try {
           rootOperator = 
drillbitContext.getPlanReader().readFragmentRoot(rootFragment.getFragmentJson());
         } catch (IOException e) {
           throw new ExecutionSetupException(String.format("Unable to parse 
FragmentRoot from fragment: %s", rootFragment.getFragmentJson()));
         }
         queryRM.setCost(rootOperator.getCost());
    -    admit(null);
    -    drillbitContext.getWorkBus().addFragmentStatusListener(queryId, 
queryManager.getFragmentStatusListener());
    -    
drillbitContext.getClusterCoordinator().addDrillbitStatusListener(queryManager.getDrillbitStatusListener());
     
    -    logger.debug("Submitting fragments to run.");
    +    fragmentsRunner.setFragmentsInfo(planFragments, rootFragment, 
rootOperator);
     
    -    // set up the root fragment first so we'll have incoming buffers 
available.
    -    setupRootFragment(rootFragment, rootOperator);
    +    startQueryProcessing();
    +  }
     
    -    setupNonRootFragments(planFragments);
    +  /**
    +   * Enqueues the query and once enqueued, starts sending out query 
fragments for further execution.
    +   * Moves query to RUNNING state.
    +   */
    +  private void startQueryProcessing() {
    +    enqueue();
    +    runFragments();
    +    queryStateProcessor.moveToState(QueryState.RUNNING, null);
    +  }
     
    -    moveToState(QueryState.RUNNING, null);
    -    logger.debug("Fragments running.");
    +  /**
    +   * Move query to ENQUEUED state. Enqueues query if queueing is enabled.
    +   * Foreman run will be blocked until query is enqueued.
    +   * In case of failures (ex: queue timeout exception) will move query to 
FAILED state.
    +   */
    +  private void enqueue() {
    +    queryStateProcessor.moveToState(QueryState.ENQUEUED, null);
    +
    +    try {
    +      queryRM.admit();
    +      queryStateProcessor.moveToState(QueryState.STARTING, null);
    +    } catch (QueueTimeoutException | QueryQueueException e) {
    +      queryStateProcessor.moveToState(QueryState.FAILED, e);
    +    } finally {
    +      String queueName = queryRM.queueName();
    +      queryManager.setQueueName(queueName == null ? "Unknown" : queueName);
    --- End diff --
    
    I'm going to violate our code review guidelines a bit and discuss design. 
Sorry.
    
    The above is a good place to talk about state design. We have the Foreman 
taking actions and setting states. We have the query state processor taking 
some actions based on those states and trying to enforce valid state 
transitions. I think what we have is a clash of two systems.
    
    The Foreman is a big bunch of code that handles the procedure of parsing, 
preparing and launching a query, all in a single thread. The query state 
processor appears to handle not just this, but events that arrive after the 
query is started.
    
    I wonder rather than having the Foreman do the setup work and the query 
state processor try to follow the states, should we just bite the bullet and 
make the query state processor be the actual query owner and manager?
    
    The query manager provides events: parse, plan, enqueue, launch, etc. It 
uses states to make sure events occur in the expected order.
    
    All the Foreman does is create the environment, create the query manager, 
and call the set of methods that we currently expect of the Foreman (from parse 
to launch.)
    
    In the future, if we can get queuing to be non-blocking (works by events, 
not synchronized calls), then the Foreman would do planning through enqueueing. 
The "accept query" action would invoke the launch action.
    
    Then, the query manager not only enforces states, it also takes the actions 
required when an event arrives in a given state.
    
    Now, I realize all of the code in Foreman is existing. But, we are doing 
refactoring, so now is the time to think through these issues.


> Canceling a query hung in planning state, leaves the query in ENQUEUED state 
> for ever.
> --------------------------------------------------------------------------------------
>
>                 Key: DRILL-5963
>                 URL: https://issues.apache.org/jira/browse/DRILL-5963
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>         Environment: Drill 1.12.0-SNAPSHOT, commit: 
> 4a718a0bd728ae02b502ac93620d132f0f6e1b6c
>            Reporter: Khurram Faraaz
>            Assignee: Arina Ielchiieva
>            Priority: Critical
>             Fix For: 1.13.0
>
>         Attachments: enqueued-2.png
>
>
> Canceling the below query that is hung in planning state, leaves the query in 
> ENQUEUED state for ever.
> Here is the query that is hung in planning state
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select 1 || ',' || 2 || ',' || 3 || ',' || 4 || 
> ',' || 5 || ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' 
> AS CSV_DATA from (values(1));
> +--+
> |  |
> +--+
> +--+
> No rows selected (304.291 seconds)
> {noformat}
> Explain plan for that query also just hangs.
> {noformat}
> explain plan for select 1 || ',' || 2 || ',' || 3 || ',' || 4 || ',' || 5 || 
> ',' || 6 || ',' || 7 || ',' || 8 || ',' || 9 || ',' || 0 || ',' AS CSV_DATA 
> from (values(1));
> ...
> {noformat}
> The above issues show the following problems:
> *1. Simple query with reasonable number of concat functions hangs.*
> In reality query does not hang it just take lots of time to execute. The root 
> cause is that during planning time DrillFuncHolderExpr return type is 
> extensively used to determine matching function, matching type etc. Though 
> this type is retrieved via 
> [getter|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/DrillFuncHolderExpr.java#L41]
>  in reality complex logic is executed beaneath it. For example for [concat 
> function|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/output/ConcatReturnTypeInference.java#L47].
>  Since function return type can not be changes during DrillFuncHolderExpr 
> life time, it is safe to cache it.
> *2. No mechanism to cancel query during ENQUEUED state.*
> Currently Drill does not have mechanism to cancel query before STARTING / 
> RUNNING states. Plus ENQUEUED state includes two PLANNING and ENQUEUED.
> Also submitting mechanism for submitting query to the queue is blocking, 
> making foreman wait till enqueueing is done Making it non-blocking will 
> prevent consuming threads that just sit idle in a busy system and also is 
> important when we move to a real admission control solution.
> The following changes were made to address above issues:
> a. two new states were added: PREPARING (when foreman is initialized) and 
> PLANNING (includes logical and / or physical planning).
> b. process of query enqueuing was made non-blocking. Once query was enqueued, 
> fragments runner is called to submit fragments locally and remotely.
> c. ability to cancel query during planning and enqueued states was added.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to