Supporting multiple version of Hadoop

Aaron McCurry Tue, 29 Oct 2013 19:52:34 -0700

There has been some interest lately about getting Blur to run successfully
on CDH4.  While I think the code will run correctly I know that configuring
Blur in that environment is challenging.  There are other versions on the
horizon as well, HDP, CDH5 (at some point), IBM's version as well as all
the official Apache versions 0.20.x through 2.2.0.  Another big problem
beyond just configuration is all the different supporting libraries that
Hadoop brings with it (or doesn't anymore in the case CDH4 and Hadoop 2.+).


So I prepose that for 0.3.0 that we make it a goal to support both legacy
1.x Hadoop and Hadoop 2.x.  I hope everyone has some ideas on how to
achieve this goal, but I will throw one out here and see what people think.

I believe that we need isolate our dependency on Hadoop through some well
defined interfaces (not just talking about Java interfaces).  Interface for
storage primarily and another for write ahead logging as well.  With a
modular approach and a nice classloader to isolate all the Hadoop
dependences from Blur that would also give us the ability to update library
versions that would normally collide with versions in Hadoop namely jetty.

Let me know what think.

Thanks!

Aaron

Supporting multiple version of Hadoop

Reply via email to