[ 
https://issues.apache.org/jira/browse/DRILL-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625714#comment-15625714
 ] 

Arina Ielchiieva commented on DRILL-4596:
-----------------------------------------

Copying Paul and Jacques comments from PR:
*Paul*
Great issue! Getting version compatibility working will be a huge benefit to 
users.

But, as it turns out, version compatibility is a complex issue. Do we have a 
design document that explains our goals and policy? Is the goal to allow 
rolling updates of clients? (Drill server at, say 1.8, rolling upgrade of 
client from 1.7 to 1.8, old clients work)? Or, is it to allow rolling upgrades 
of servers? Both?

MapR customers receive even releases: 1.4, 1.6, 1.8. Would the +/-1 policy 
benefit them?

As I understand it, each Drill bit is to work with another of +/-1 version. 
But, what I bring up Drill 1.6, Drill 1.5 and Drill 1.4. The 1.5 is happy to 
work with both 1.6 and 1.4. But, the 1.6 and 1.4 versions will fail only when 
they communicate with one another. When will this communication occur? At 
startup? Or, only later when, say, 1.6 tries to send a query to 1.4?

Does this mean that Drillbits should advertise their version in ZooKeeper so 
that we fail fast and can provide a clear error message? (DRILL-4543)

Dremio proposes a new 2.0 release that breaks compatibility. Will Drill 1.9 
(say) be compatible with the (proposed) incompatile Drill 2.0? Should it be? 
How should we make that work?

As others have said, we need to consider wire protocol and semantics. The usual 
solution is protocol negotiation. If a 1.6 client connects to a 1.7 server, 
they agree to "speak" 1.6. If a 1.7 client connects to a 1.6 server, they also 
agree to "speak" 1.6. Such as solution has impact on our messaging layer. It 
increases testing requirements.

Drill-on-YARN will provide another way to do server upgrades (ramp up a new 
cluster while ramping down an old one.) Otherwise, YARN will need some way to 
run the same cluster, replacing version X drillbits with version X+1 (while 
still running the version X Application Master).

If I may suggest, this is a complex topic. This is much more than a code change 
to compare version numbers. Perhaps we should work out the goals and issues in 
a design document.

Some other design issues. The idea of a rollling upgrade presupposes that we 
can shut down a Drillbit, bring up a new one, and the cluster keeps running. 
But, today, bringing down a Drillbit causes all in-flight queries on that node 
to fail. There is no way to mark a node as "quiescent" (up, but not accepting 
new work.) So, a rolling upgrade today would entail a long series of query 
failures as we replace each of, say, 20 or 50 nodes. So, in fact, it is less 
disruptive to take the cluster down, push an upgrade, and bring it back up. 
(See DRILL-4286.)

Back on testing: testing is essential. A feature that allow +/-1 feature 
compatibility is not helpful unless someone (other than the user) can certify 
that it works. If the user gets to do the checking, then it is not very 
helpful: safer just to do a full upgrade.

To emphasize an earlier point: there are two distinct issues. One is a managed 
cluster upgrade (the admin can do it with the help of a management tool.) The 
other are the many Drill clients spread across desktops: that is a classic 
desktop software upgrade. Some might be on planes, others locked in desks while 
someone is on vacation. Let's think about how to upgrade JDBC drivers and the 
like given this reality.

Is the compatiblity policy number or time based? As an admin, can I expect to 
have a three-month window for upgrades? Or, will it sometimes be one month, 
others four months, depending on who changes what? Should we have a time-based 
policy?

*Jacques*
In general, I'm against version number checking. We did that in the code early 
on but we should be moving towards a capabilities flag approach.

Also agree with Paul in his mention of DRILL-4286, don't think worrying about 
rolling upgrade makes sense until we resolve the issues around decommissioning.

> Drill should do version check among drillbits
> ---------------------------------------------
>
>                 Key: DRILL-4596
>                 URL: https://issues.apache.org/jira/browse/DRILL-4596
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.6.0
>            Reporter: Arina Ielchiieva
>             Fix For: Future
>
>
> Before registering new drillbit in zookeeper, we should do version check, and 
> make sure all the running drillbits are in the same version.
> Using drillbits of different version can lead to unexpected results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to