cool job 2008/9/24 Colin Evans <[EMAIL PROTECTED]>
> Freebase is finally open-sourcing our Jython-based framework for writing > map-reduce jobs on Hadoop. Happy tightly embeds Jython into the Hadoop > APIs, files off a lot of the sharp edges, and makes writing map-reduce > programs a breeze. This is the 0.1 release, but we've been using Happy at > Freebase for a while, so it is stable and full-featured. Take a look and > let me know if it is useful. > > The project and docs are here: > > http://code.google.com/p/happy/ > http://www.mqlx.com/~colin/happy.html<http://www.mqlx.com/%7Ecolin/happy.html> > > Here's an example word count program written in Happy: > > --- > import sys, happy, happy.log > > happy.log.setLevel("debug") > log = happy.log.getLog("wordcount") > > class WordCount(happy.HappyJob): > def __init__(self, inputpath, outputpath): > happy.HappyJob.__init__(self) > self.inputpaths = inputpath > self.outputpath = outputpath > self.inputformat = "text > > def map(self, records, task): > for _, value in records: > for word in value.split(): > task.collect(word, "1") > > def reduce(self, key, values, task): > count = 0; > for _ in values: count += 1 > task.collect(key, str(count)) > log.debug(key + ":" + str(count)) > happy.results["words"] = happy.results.setdefault("words", 0) + count > happy.results["unique"] = happy.results.setdefault("unique", 0) + 1 > > if __name__ == "__main__": > if len(sys.argv) < 3: > print "Usage: <inputpath> <outputpath>" > sys.exit(-1) > wc = WordCount(sys.argv[1], sys.argv[2]) > results = wc.run() > print str(sum(results["words"])) + " total words" > print str(sum(results["unique"])) + " unique words" > --- > > > Thanks > Colin > >
