There has been some interest lately about getting Blur to run successfully on CDH4. While I think the code will run correctly I know that configuring Blur in that environment is challenging. There are other versions on the horizon as well, HDP, CDH5 (at some point), IBM's version as well as all the official Apache versions 0.20.x through 2.2.0. Another big problem beyond just configuration is all the different supporting libraries that Hadoop brings with it (or doesn't anymore in the case CDH4 and Hadoop 2.+).
So I prepose that for 0.3.0 that we make it a goal to support both legacy 1.x Hadoop and Hadoop 2.x. I hope everyone has some ideas on how to achieve this goal, but I will throw one out here and see what people think. I believe that we need isolate our dependency on Hadoop through some well defined interfaces (not just talking about Java interfaces). Interface for storage primarily and another for write ahead logging as well. With a modular approach and a nice classloader to isolate all the Hadoop dependences from Blur that would also give us the ability to update library versions that would normally collide with versions in Hadoop namely jetty. Let me know what think. Thanks! Aaron
