I've been using the Hadoop project off and on for the last year in some ongoing work studying Wikipedia. One of the tasks I developed computes the revision-to-revision diff across all edits in the Wikipedia history. From the time I first developed the job (last summer) to the latest operation (last week, running on the 0.13.0 release), I've seen a pretty remarkable increase in performance. Even though the the input size has more than doubled, the time to run the job on Hadoop has dropped by half, for a roughly 4x overall improvement in performance. Thanks everyone!
-- Bryan A. P. Pendleton Ph: (877) geek-1-bp
