I've been using the Hadoop project off and on for the last year in some
ongoing work studying Wikipedia. One of the tasks I developed computes the
revision-to-revision diff across all edits in the Wikipedia history. From
the time I first developed the job (last summer) to the latest operation
(last week, running on the 0.13.0 release), I've seen a pretty remarkable
increase in performance. Even though the the input size has more than
doubled, the time to run the job on Hadoop has dropped by half, for a
roughly 4x overall improvement in performance.  Thanks everyone!

--
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Reply via email to