cool job

2008/9/24 Colin Evans <[EMAIL PROTECTED]>

> Freebase is finally open-sourcing our Jython-based framework for writing
> map-reduce jobs on Hadoop.  Happy tightly embeds Jython into the Hadoop
> APIs, files off a lot of the sharp edges, and makes writing map-reduce
> programs a breeze.  This is the 0.1 release, but we've been using Happy at
> Freebase for a while, so it is stable and full-featured.  Take a look and
> let me know if it is useful.
>
> The project and docs are here:
>
> http://code.google.com/p/happy/
> http://www.mqlx.com/~colin/happy.html<http://www.mqlx.com/%7Ecolin/happy.html>
>
> Here's an example word count program written in Happy:
>
> ---
> import sys, happy, happy.log
>
> happy.log.setLevel("debug")
> log = happy.log.getLog("wordcount")
>
> class WordCount(happy.HappyJob):
>   def __init__(self, inputpath, outputpath):
>       happy.HappyJob.__init__(self)
>       self.inputpaths = inputpath
>       self.outputpath = outputpath
>       self.inputformat = "text
>
>   def map(self, records, task):
>       for _, value in records:
>           for word in value.split():
>               task.collect(word, "1")
>
>   def reduce(self, key, values, task):
>       count = 0;
>       for _ in values: count += 1
>       task.collect(key, str(count))
>       log.debug(key + ":" + str(count))
>       happy.results["words"] = happy.results.setdefault("words", 0) + count
>       happy.results["unique"] = happy.results.setdefault("unique", 0) + 1
>
> if __name__ == "__main__":
>   if len(sys.argv) < 3:
>       print "Usage: <inputpath> <outputpath>"
>       sys.exit(-1)
>   wc = WordCount(sys.argv[1], sys.argv[2])
>   results = wc.run()
>   print str(sum(results["words"])) + " total words"
>   print str(sum(results["unique"])) + " unique words"
> ---
>
>
> Thanks
> Colin
>
>

Reply via email to