[
https://issues.apache.org/jira/browse/DRILL-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325885#comment-15325885
]
John Omernik commented on DRILL-4286:
-------------------------------------
Ahh... Great point. Using seconds just to make thing easy for others to
follow, the condition I missed was this:
Tx is the the number of seconds
Q1 is our Query
N5 will be our node "to be drained
T0 - Foreman Initiates planning for a query This includes all 10 nodes in a
cluster N5 is included in this plan.
N5 Status: RUN (Accepting work)
Q1 Status: Planning
T1 - Drain of Node 5 requests and is started
N5 Status: DRAINING(Accepting work)
Q1 Status: Planning
T2 - Drain of N5 completes and N5 is now Drained/Offline
N5 Status: Drained (Not accepting work)
Q1 Status: Planning
T3 - Q1 Planning completes and work is sent
N5 Status: Drained (Not Accepting Work)
Q1 Status: Error because it tried to send work to a node in a offline state
User Status: Livid that their query failed
Admin Status: Exasperated because a user is mad
I missed this case, and I am glad you spotted it, thanks!
So looking at your options to address this, my gut reaction to the "stating
intentions" option is that it seems fairly complex, and could open the door for
issues. I don't want to remove it from options, but I have a personal
preference for simple solutions when possible.
This leads to the grace period option. So basically, we could set a "no work
grace period" option. When a node is draining, the node will accept work for
this grace period even if all current queries are complete. I think this
option could be a good way to approach things, but I'd lay out some caveats or
"additional" work to catch edge cases.
* The first thing I'd like to pose to the community is are there metrics for
planning to execution, i.e. what are some acceptable delays here? So if I plan
something does anyone have metrics on the average time from plan to work, or
plan to max(time for a fragment to be submitted)? On a VERY large
query/cluster, could this extremely high? If we set it to say 120 seconds
grace period, how common would a plan that was made at T0 NOT have any work
submitted in that 120 seconds? Metrics, user stories here would be good.
* The grace period would be an option in Drill and busy clusters could set this
higher.
* I think this grace period timer should ONLY start when all work on a bit is
complete. Thus any work that comes in on the grace period would take the grace
timer and reset back to the set grace period option, and that counter would
start ticking when all work is again complete. Remember, only queries that was
planned when the node was in a RUN state should actually be submitting work.
* Thinking about where the time gaps could be (and please correct me if I am
wrong). The planning should almost always complete in a "grace period" time (a
long running planning operation is one of those things that I think is avoided
in most cases), however, the work, on a large query may not. I.e. the query
could be planned in that time, but a specific fragment may not be submitted to
the node until another fragment (say a slow one) is complete. That could leader
to an issue.
* So thinking about the previous point. Could a message be developed that once
planning is complete and since the foreman will know all work that is "yet" to
complete, that the foreman can ping nodes to reset the timer? This ping
interval would also be an option for tuning on large or small clusters. Thus,
a small message that simply states, "You have work coming on this currently
running query" could reset the grace period timer. This could even be handled
like a "empty fragment" I.e. from the drillbit perspective, it's "work" that
requires no effort, but per a previous bullet, it resets the timer. This
would require less of a "reservation" or stating of intention that has to be
recorded and handling by the node (Consider the situation that if a foreman
states to a node "You'll have work coming, don't shutdown" but the
foreman/query dies and it doesn't clean up that intention? Will that node ever
go offline?) but just prolongs the grace period timer. Basically, in this
method, we allow the countdown timer on the draining to drained be reset,
accepted work resets it, and simple message from a foreman can reset it. But
in the end, if no "foremen ping" or "accepted work" comes in, the node WILL
pull itself out of operation cleaning up for potential error situations.
I think when my gut thinks about the stating of intentions, my worry is about
cleanup/other processes for tracking on the node. I like the grace period
option, but it needs to some ways to be extended to catch edge cases. I think
as you state Paul, it can be done with a bit of thought
This is coming together. I really believe while this may not be "neatest" of
things to implement in Drill (I can squash bugs, or I can add features, or I
can improve performance... all sexier things to work on) The importance of this
feature can not be understated. Whether for a Drill stand alone cluster, a
Mesos, or a Yarn implementation. This type of feature is critical to a well
run, multi-tenant environment, especially those environments that have strong
compliance policies for patching, maintenance etc in that this feature allows
for integration into standard/automated systems enterprises have in place.
This flexibility could be a feature that draws enterprises to Drill.
> Have an ability to put server in quiescent mode of operation
> ------------------------------------------------------------
>
> Key: DRILL-4286
> URL: https://issues.apache.org/jira/browse/DRILL-4286
> Project: Apache Drill
> Issue Type: New Feature
> Components: Execution - Flow
> Reporter: Victoria Markman
>
> I think drill will benefit from mode of operation that is called "quiescent"
> in some databases.
> From IBM Informix server documentation:
> {code}
> Change gracefully from online to quiescent mode
> Take the database server gracefully from online mode to quiescent mode to
> restrict access to the database server without interrupting current
> processing. After you perform this task, the database server sets a flag that
> prevents new sessions from gaining access to the database server. The current
> sessions are allowed to finish processing. After you initiate the mode
> change, it cannot be canceled. During the mode change from online to
> quiescent, the database server is considered to be in Shutdown mode.
> {code}
> This is different from shutdown, when processes are terminated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)