Hi, I'd like to introduce you Pangool <http://pangool.net/>, an easier low-level MapReduce API for Hadoop. I'm one of the developers. We just open-sourced it yesterday.
Pangool is a Java, low-level MapReduce API with the same flexibility and performance than the plain Java Hadoop MapReduce API. The difference is that it makes a lot of things easier to code and understand. A few of Pangool's features: - Tuple-based intermediate serialization (allowing easier development). - Built-in, easy-to-use group by and sort by (removing boilerplate code for things like secondary sort). - Built-in, easy-to-use reduce-side joins (which are quite hard to implement in Hadoop). - Augmented Hadoop API: Built-in multiple inputs / outputs, configuration via object instance. Pangool meets the need of making Hadoop's steep learning curve a lot smoother while retaining all its features, power and flexibility. It differs in high-level tools like Pig or Hive in that it can be used as a replacement of the low-level API. There is no performance / flexibility penalty paid for using Pangool. We did an initial benchmark <http://pangool.net/benchmark.html> to show this idea. I'd be very interested in hearing your feedback, opinions and questions on it. Cheers, Pere.