Hi, we've just released version 0.3.6_rc2 of Pydoop (http://pydoop.sourceforge.net). Pydoop is a Python MapReduce and HDFS API for Hadoop, built upon the C++ Pipes and the C libhdfs APIs, that allows to write full-fledged MapReduce applications with HDFS access. Its key features are:
* access to most MapReduce application components: Mapper, Reducer, RecordReader, RecordWriter, Partitioner; * direct access to JobConf parameters; support for counters and status messages; * CPython implementation: any Python module can be used, either pure Python or C/C++ extension (note that this is not possible with Jython); * Direct HDFS access from Python. With Pydoop you can write complete applications in Python, using a programming style that's very similar to the one supported by the Java and C++ APIs: developers define classes that are instantiated and used by the framework. This allows for much cleaner and faster [1] code with respect to the traditional Python + Streaming approach. See http://pydoop.sourceforge.net/docs/examples for a collection of Pydoop usage examples, including a complete application that leverages the Hadoop Distributed Cache to distribute all required Python packages, including Pydoop itself, to Hadoop cluster nodes. Pydoop is actively used in production at our site, mostly for data-intensive biocomputing applications. The 0.3.6_rc2 release is being used internally in production. We'd greatly appreciate any kind of feedback before we release it as 0.3.6 (stable), which we expect to do within two weeks or so. Links: * download page: http://sourceforge.net/projects/pydoop/files * release notes: http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Release_Notes [1] Simone Leo and Gianluigi Zanetti. "Pydoop: a Python MapReduce and HDFS API for Hadoop". In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC 2010), pages 819–825. ACM, 2010. -- Simone Leo Distributed Computing group Advanced Computing and Communications program CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: [email protected] http://www.crs4.it
