[ 
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325018#comment-15325018
 ] 

Paul Rogers commented on DRILL-4286:
------------------------------------

We'd still have a race condition: if Formen kept sending queries while in 
DRAINING, the node will never transition to DRAINED on a busy system. I like 
the time-based idea, but it does require synchronized clocks. And, it creates 
an "unlimited liability" for a Drillbit: how would a Drillbit know that some 
(overloaded) Foreman is cranking away on a plan that will eventually be 
submitted under the "grandfather in" policy? In the extreme case, the "victim" 
Drillbit will wait, assume no new queries are coming, and exit. Later the 
(overloaded) Foreman will try to submit a fragment and fail. We'd either fail 
the query or have to replay. So, we're back where we started.

Talking to the other developers, it seems we allocate query fragments to 
drillbits in the parallelization phase, which occurs after emerging from the 
(optional) queue. So, there is no race condition when sitting in the queue, 
which was a worry. Still, there is a race condition between getting the list of 
"active drill bits" (which would be filtered by state in the new world), and 
submitting the query.

Three potential solutions are:

1. Drillbits offer a grace period after transitioning to DRAINING. Say, 30 
seconds in which foremen can submit their in-flight queries. (As discussed 
above.)

2. A "reserve" message that says, "I'm going to send you a query, just hold on 
a bit while I finish planning." Reservations are preserved across  RUN --> 
DRAINING transitions.

3. The try/fail/retry mechanism suggested in the previous node.

As discussed above, the grace period will work most of the time. The problem 
is, if the cluster is overloaded, things are slow and planning may exceed the 
grace period. We'd then still need solutions 2 or 3 to handle the (rare) cases 
where the grace period expires. 

So, the choice is either 2 (by itself), or 3 with optionally 1 to minimize the 
number of retries.

> Have an ability to put server in quiescent mode of operation
> ------------------------------------------------------------
>
>                 Key: DRILL-4286
>                 URL: https://issues.apache.org/jira/browse/DRILL-4286
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Execution - Flow
>            Reporter: Victoria Markman
>
> I think drill will benefit from mode of operation that is called "quiescent" 
> in some databases. 
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to 
> restrict access to the database server without interrupting current 
> processing. After you perform this task, the database server sets a flag that 
> prevents new sessions from gaining access to the database server. The current 
> sessions are allowed to finish processing. After you initiate the mode 
> change, it cannot be canceled. During the mode change from online to 
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to