Hi Josh, hadoop-1.0.3 + hbase-0.94.0 + crunch didn't work for me. It requires avro-1.5.3 and doesn't compile with avro-1.7.0; but I think the real problem is their use of MethodUtils from commons-lang-2.5 which isn't in Hadoop's commons-lang-2.4.
Of course, we can use hbase-0.90.5, downgrade crunch to avro-1.3.3 and thrift-0.2.0, pray that jersey is irrelevant, tie all other HBase dependencies to the versions Hadoop uses and hope that it works. It may work, at the price of forcing some old versions on our users. But actually, if it works or not isn't the main point, let's have a look at the user perspective. When you use a thirdparty framework like Hadoop, your application inherits the framework's classpath (*). This means, any other dependency your application has (including transitive dependencies) has to be compatible with the framework's dependencies. The more complex your application is, the more this hurts you. You can't update your dependencies because the framework locks you in. Porting existing, complex applications to the framework is nearly impossible. I've seen this many times, that's why I evaluate my dependencies carefully. Crunch itself is pretty minimal when it comes to its direct dependencies (we could be even more minimal with little effort). With HBase, however, things look a lot more difficult and that's going to scare users away. I think if we have the chance to make HBase support an optional feature, much like MapReduce support is optional in Avro, then we should take it. Users are very thankful when you leave them a choice. I'm a user, I know. I've evaluated dozens of libraries and frameworks and dismissed quite a few because of dependency conflicts. If you're organized well enough to have an evaluation checklist, then this will be on it. I'd like to use Crunch in production one day without bending the rules, so let's lower the barrier to adoption. Regards, Matthias, stepping off the soap box (*) Yes, I know about classloader isolation in Java EE and HADOOP_USER_CLASSPATH_FIRST. On Sunday, 2012-08-05, Josh Wills wrote: > Hey Matthias, > > I'm not quite willing to give up on hbase just yet-- how does 1.0.3 > +Crunch look against hbase 0.94? Is the primary issue the Avro 1.7.0 > conflicts? > > J > > On Sun, Aug 5, 2012 at 2:10 AM, Matthias Friedrich <[email protected]> wrote: > > Hi, > > > > I spent most of Saturday resolving dependency conflicts for CRUNCH-16. > > Since nobody's going to read a long mail, here are the cliff notes: > > > > hadoop-core-1.0.3, hbase-0.90.5, and avro-1.7.0 are incompatible and > > I found no safe solution to fix it. Moving HBase support to a separate > > Maven module may be the best solution because it reduces risk for > > users who don't need HBase. > > > > > > The longer version: > > > > The POM of hadoop-core-1.0.3 is in a sorry state. It doesn't list all > > libraries that are on the runtime classpath, and of these, some are > > wrong. For example, integration tests using LocalJobRunner don't work > > unless you add more dependencies yourself (ie. commons-io). Also, roughly > > a dozen of hbase-0.90.5's 40 dependencies are in conflict with > > hadoop-core-1.0.3. This means we have to add quite a few "provided" > > dependencies with the correct versions ourselves, but these aren't > > propagated to our users so they have to do the same or risk conflicts > > at runtime. > > > > I resolved the conflicts to a point where our integration tests work > > which is unfortunately no guarantee that things will work for our users. > > Using the dependencies of hadoop-core-1.0.3 + Crunch's, the source > > distribution of hbase-0.90.5 doesn't even compile. At an interface > > level, it is incompatible with protobuf-java-2.4.1 (easy enough to fix) > > and avro-1.7.0 (not so easy to fix). Changing only those dependencies > > that are interface compatible (about a dozen) unsurprisingly leads to > > HBase test case failures. This may not affect HBase clients, but you > > never know. There is no hbase-client library so you always get > > everything unless you know HBase well enough to get your exclusions > > right. > > > > > > So, where do we go from here? I can get a patch ready that paints > > over some of these problems and makes sure that the dependencies we > > use in our test cases are the same as during runtime. But I really > > need careful review for this. > > > > To be honest, this situation leaves me a bit uneasy. Maybe the best > > long term solution would be to move HBase support to a separate Maven > > module that depends on crunch core and not force it on everyone. This > > will reduce risk greatly for those who don't need HBase. I think it's > > definitely worth giving it a shot. > > > > What do you think, guys? > > > > Regards, > > Matthias
