We have discussed the removal of hadoop-1 and MR support in Hive 2 line in the 
past..
Hadoop-1 removal seems to be non-controversial and on track; before we cut the 
first release of Hive 2, I propose we deprecate MR.

Tez and Spark engines provide vast perf improvements over MR;
Execution optimization work by most contributors for a long time has been done 
for these engines and is not portable to MR, so it is languishing further;
At the same time, supporting additional code has other development costs for 
new features or bugs, plus we have to run tests for it both in Apache and for 
local changes and to deploy code.

However, MR is hard to remove. Plus, it may provide a baseline for some bugs in 
other engines (which is not bulletproof since MR logic can be incorrect), or to 
mock during perf benchmarks.

Therefore, I propose that for now we add deprecation warnings suggesting the 
other alternatives:

  *   to Hive configuration documentation.
  *   to Hive wiki.
  *   to release notes on Hive 2.
  *   in Beeline and CLI when using MR.

Additionally, I propose we remove Minimr test driver from HiveQA runs for 
master.

What do you think?

Reply via email to