After more than one year since previous release I am proud to announce a new version of HAMAKE. Based on our experience of using we rewrote it in Java, added support for Amazon EMR. We also streamlined XML syntax and updated and improved documentation. Please visit http://code.google.com/p/hamake/ to learn more and to download a new version.
Brief description: Most non-trivial data processing scenarios with Hadoop typically require more than one MapReduce job. Usually such processing is data-driven, with the data funneled through a sequence of jobs. The processing model could be presented in terms of dataflow programming. It could be expressed as a directed graph, with datasets as nodes. Each edge indicates a dependency between two or more datasets and is associated with a processing instruction (Hadoop MapReduce job, PIG Latin script or an external command), which produces one dataset from the others. Using fuzzy timestamps as a way to detect when a dataset needs to be updated, we can calculate a sequence in which the tasks need to be executed to bring all datasets up to date. Jobs for updating independent datasets could be executed concurrently, taking advantage of your Hadoop cluster's full capacity. The dependency graph may even contain cycles, leading to dependency loops which could be resolved using dataset versioning. These ideas inspired the creation of HAMAKE utility. We tried emphasizing data and allowing the developer to express one's goals in terms of dataflow (versus workflow). Data dependency graph is expressed using just two data flow instructions: fold and foreach providing a clear processing model, similar to MapReduce, but on a dataset level. Another design goal was to create a simple to use utility that developers can start using right away without complex installation or extensive learning. Key Features * Lightweight utility - no need for complex installation * Based on dataflow programming model * Easy learning curve. * Supports Amazon Elastic MapReduce * Allows to run MapReduce jobs as well as PIG Latin scripts Sincerely, Vadim Zaliva
